Issue about memory order on ARM
The code in GetSnapshotData() that read the `xid` field of PGXACT struct has a dependency on code in GetNewTransactionId() that write `MyPgXact-xid`. It means that the store of xid should happen before the load of it. In C11, we can use [Release-Acquire ordering](https://en.cppreference.com/w/c/atomic/memory_order#Release-Acquire_ordering) to achieve it. But now, there is no special operation to do it(, and the [volatile](https://en.cppreference.com/w/c/language/volatile) keyword should not play any role in this situation). So it means that when a backend A returns from GetNewTransactionId(), the newval of `MyPgXact-xid` maybe just in CPU store buffer, or CPU cache line, so the newval is not yet visible to backend B that calling GetSnapshotData(). So the assumption that 'all top-level XIDs <= latestCompletedXid are either present in the ProcArray, or not running anymore' may not be safe.
Re: [HACKERS] Block level parallel vacuum
On Sat, Nov 30, 2019 at 7:18 PM Sergei Kornilov wrote: > Hello > > Its possible to change order of index processing by parallel leader? In > v35 patchset I see following order: > - start parallel processes > - leader and parallel workers processed index lixt and possible skip some > entries > - after that parallel leader recheck index list and process the skipped > indexes > - WaitForParallelWorkersToFinish > > I think it would be better to: > - start parallel processes > - parallel leader goes through index list and process only indexes which > are skip_parallel_index_vacuum = true > - parallel workers processes indexes with skip_parallel_index_vacuum = > false > - parallel leader start participate with remainings parallel-safe index > processing > - WaitForParallelWorkersToFinish > > This would be less running time and better load balance across leader and > workers in case of few non-parallel and few parallel indexes. > Why do you think so? I think the advantage of the current approach is that once the parallel workers are launched, the leader can process indexes that don't support parallelism. So, both type of indexes can be processed at the same time. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Do we have a CF manager for November?
On Sun, Dec 1, 2019 at 9:23 AM Michael Paquier wrote: > On Thu, Nov 28, 2019 at 04:02:55PM +0900, Michael Paquier wrote: > > So, we are close to the end of this commit fest, and I have done a > > first pass on something like one third of the entries, mainly updating > > incorrect patch status, bumping them into next CF or closing stale > > items waiting on author. > > I have worked more on the CF. And here are the final results: > Committed: 36. > Moved to next CF: 146. > Withdrawn: 4. > Rejected: 6. > Returned with Feedback: 29. > Total: 221. > > The results are very comparable with the last CF, where 39 were marked > as committed and 28 as returned with feedback. I know that we are > still a couple of hours before officially being in December AoE > (exactly 9), so I am a bit ahead. My apologies about that. > I have been able to go through each patch, and pinged each author > where needed. Thank you for your efforts. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Do we have a CF manager for November?
On Thu, Nov 28, 2019 at 04:02:55PM +0900, Michael Paquier wrote: > So, we are close to the end of this commit fest, and I have done a > first pass on something like one third of the entries, mainly updating > incorrect patch status, bumping them into next CF or closing stale > items waiting on author. I have worked more on the CF. And here are the final results: Committed: 36. Moved to next CF: 146. Withdrawn: 4. Rejected: 6. Returned with Feedback: 29. Total: 221. The results are very comparable with the last CF, where 39 were marked as committed and 28 as returned with feedback. I know that we are still a couple of hours before officially being in December AoE (exactly 9), so I am a bit ahead. My apologies about that. I have been able to go through each patch, and pinged each author where needed. By the way, the cfbot has been extremely helpful in this exercise: http://commitfest.cputube.org/ Thanks, -- Michael signature.asc Description: PGP signature
Re: [PROPOSAL] Temporal query processing with range types
On Thu, Aug 08, 2019 at 09:58:31AM +0200, Peter Moser wrote: > Thanks a lot for your effort. We are now trying to put again more work > and time in this patch. > We are grateful for any feedback. The latest patch applies, but does not build because of an OID conflict. For development purposes, please make sure to use an OID in the range 8000~9000 which are reserved for development per the recently-added new project policy. For now I have moved the patch to next CF, waiting on author. -- Michael signature.asc Description: PGP signature
Re: NOT IN subquery optimization
On Wed, Jun 26, 2019 at 09:26:16PM +, Li, Zheng wrote: > Let me know if you have any comments. I have one: the latest patch visibly applies, but fails to build because of the recent API changes around lists in the backend code. So a rebase is in order. The discussion has not moved a iota in the last couple of months, still as the latest patch has not received reviews, I have moved it to next CF waiting on author. -- Michael signature.asc Description: PGP signature
Re: Protect syscache from bloating with negative cache entries
On Tue, Nov 19, 2019 at 07:48:10PM +0900, Kyotaro Horiguchi wrote: > I'd like to throw in food for discussion on how much SearchSysCacheN > suffers degradation from some choices on how we can insert a code into > the SearchSysCacheN code path. Please note that the patch has a warning, causing cfbot-san to complain: catcache.c:786:1: error: no previous prototype for ‘CatalogCacheFlushCatalog2’ [-Werror=missing-prototypes] CatalogCacheFlushCatalog2(Oid catId) ^ cc1: all warnings being treated as errors So this should at least be fixed. For now I have moved it to next CF, waiting on author. -- Michael signature.asc Description: PGP signature
Re: extension patch of CREATE OR REPLACE TRIGGER
On Thu, Aug 01, 2019 at 09:49:53PM +1200, Thomas Munro wrote: > The end of CF1 is here. I've moved this patch to CF2 (September) in > the Commitfest app. Of course, everyone is free to continue > discussing the patch before then. When you have a new version, please > set the status to "Needs review". The latest patch includes calls to heap_open(), causing its compilation to fail. Could you please send a rebased version of the patch? I have moved the entry to next CF, waiting on author. -- Michael signature.asc Description: PGP signature
Re: bitmaps and correlation
On Sat, Nov 02, 2019 at 03:26:17PM -0500, Justin Pryzby wrote: > Attached is a fixed and rebasified patch for cfbot. > Included inline for conceptual review. Your patch still causes two regression tests to fail per Mr Robot's report: join and select. Could you look at those problems? I have moved the patch to next CF, waiting on author. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Implement INSERT SET syntax
On Fri, Nov 22, 2019 at 12:24:15PM +1300, Gareth Palmer wrote: > Attached is an updated patch with for_locking_clause added, test-cases > re-use existing tables and the comments and documentation have been > expanded. Per the automatic patch tester, documentation included in the patch does not build. Could you please fix that? I have moved the patch to next CF, waiting on author. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] Cached plans and statement generalization
On Thu, Sep 26, 2019 at 10:23:38AM +0300, Konstantin Knizhnik wrote: > Sorry, > New version of the patch with corrected expected output for rules test is > attached. It looks like the documentation is failing to build. Could you fix that? There may be other issues as well. I have moved the patch to next CF, waiting on author. -- Michael signature.asc Description: PGP signature
Re: Asymmetric partition-wise JOIN
On Sat, Aug 24, 2019 at 05:33:01PM +0900, Kohei KaiGai wrote: > On the other hands, it eventually consumes almost equivalent amount > of memory to load the inner relations, if no leafs are pruned, and if we > could extend the Hash-node to share the hash-table with sibling > join-nodess. The patch crashes when running the regression tests, per the report of the automatic patch tester. Could you look at that? I have moved the patch to nexf CF, waiting on author. -- Michael signature.asc Description: PGP signature
Re: Control your disk usage in PG: Introduction to Disk Quota Extension
On Fri, Sep 27, 2019 at 11:30:08AM +0800, Haozhou Wang wrote: > I rebased this patch with the newest master branch. Attached the new > patch disk_quota_hooks_v5.patch in the attachment. This again needs a rebase, so I have switched it as waiting on author. -- Michael signature.asc Description: PGP signature
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
On Fri, Nov 01, 2019 at 09:38:37AM +0900, Moon, Insung wrote: > Of course, I may not have written the excellent quality code > correctly, so I will make an interim report if possible. The last patch has rotten, and does not apply anymore. A rebase would be nice, so I am switching the patch as waiting on author, and moved it to next CF. The discussion has gone long around.. -- Michael signature.asc Description: PGP signature
Re: Speed up transaction completion faster after many relations are accessed in a transaction
On Thu, Sep 26, 2019 at 07:11:53AM +, Tsunakawa, Takayuki wrote: > I'm sorry to repeat what I mentioned in my previous mail, but my v2 > patch's approach is based on the database textbook and seems > intuitive. So I attached the rebased version. If you wish to do so, that's fine by me but I have not dived into the details of the thread much. Please not anyway that the patch does not apply anymore and that it needs a rebase. So for now I have moved the patch to next CF, waiting on author. -- Michael signature.asc Description: PGP signature
Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line
On Thu, Sep 26, 2019 at 03:08:22PM +0300, Alexey Kondratov wrote: > As Alvaro correctly pointed in the nearby thread [1], we've got an > interference regarding -R command line argument. I agree that it's a good > idea to reserve -R for recovery configuration write to be consistent with > pg_basebackup, so I've updated my patch to use another letters: The patch has rotten and does not apply anymore. Could you please send a rebased version? I have moved the patch to next CF, waiting on author for now. -- Michael signature.asc Description: PGP signature
Re: Implementing Incremental View Maintenance
On Fri, Nov 29, 2019 at 06:19:54PM +0900, Yugo Nagata wrote: > Sorry, an unfinished line was left... Please ignore this. A rebase looks to be necessary, Mr Robot complains that the patch does not apply cleanly. As the thread is active recently, I have moved the patch to next CF, waiting on author. -- Michael signature.asc Description: PGP signature
Re: pglz performance
On Fri, Nov 29, 2019 at 10:18:18AM +0500, Andrey Borodin wrote: >> 29 нояб. 2019 г., в 3:43, Tomas Vondra >> написал(а): >> >> OK, pushed, with some minor cosmetic tweaks on the comments (essentially >> using the formatting trick pointed out by Alvaro), and removing one >> unnecessary change in pglz_maximum_compressed_size. > > Cool, thanks! Yippee. Thanks. -- Michael signature.asc Description: PGP signature
Re: Runtime pruning problem
On Sat, Nov 30, 2019 at 09:43:35PM -0500, Tom Lane wrote: > Fair enough, but I did actually spend some time on the issue today. > Just to cross-link this thread to the latest, see > > https://www.postgresql.org/message-id/12424.1575168015%40sss.pgh.pa.us Thanks, just saw the update. -- Michael signature.asc Description: PGP signature
Re: Runtime pruning problem
Michael Paquier writes: > On Thu, Sep 12, 2019 at 10:24:13AM -0400, Tom Lane wrote: >> It's on my to-do list, but I'm not sure how soon I'll get to it. > Seems like it is better to mark this CF entry as returned with > feedback then. Fair enough, but I did actually spend some time on the issue today. Just to cross-link this thread to the latest, see https://www.postgresql.org/message-id/12424.1575168015%40sss.pgh.pa.us regards, tom lane
Re: Tid scan improvements
On Thu, Sep 05, 2019 at 01:06:56PM +1200, Edmund Horner wrote: > So, I think we need to either get some help from someone familiar with > heapam.c, or maybe shelve the patch. It has been work in progress for > over a year now. Okay, still nothing has happened after two months. Once this is solved a new patch submission could be done. For now I have marked the entry as returned with feedback. -- Michael signature.asc Description: PGP signature
Re: Online checksums patch - once again
On Wed, Oct 02, 2019 at 08:59:27PM +0200, Magnus Hagander wrote: > Much appreciated! The latest patch does not apply, could you send a rebase? Moved it to next CF, waiting on author. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Opclass parameters
On Thu, Sep 12, 2019 at 02:16:34AM +0200, Tomas Vondra wrote: > I still think using procnum 0 and passing the data through fn_expr are not > the right solution. Firstly, traditionally the amprocs are either required > or optional, with required procs having low procnums and optional starting > at 11 or so. The 0 breaks this, because it's optional but it contradicts > the procnum rule. Also, what happens if we need to add another optional > amproc defined for all AMs? Surely we won't use -1. > > IMHO we should keep AM-specific procnum and pass it somehow to the AM > machinery. The latest review has not been addressed, and this was 7 weeks ago. So I am marking the patch as returned with feedback. -- Michael signature.asc Description: PGP signature
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Fri, Nov 22, 2019 at 01:18:11PM +0530, Dilip Kumar wrote: > I have rebased the patch on the latest head and also fix the issue of > "concurrent abort handling of the (sub)transaction." and attached as > (v1-0013-Extend-handling-of-concurrent-aborts-for-streamin) along with > the complete patch set. I have added the version number so that we > can track the changes. The patch has rotten a bit and does not apply anymore. Could you please send a rebased version? I have moved it to next CF, waiting on author. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] Transactions involving multiple postgres foreign servers, take 2
On Wed, Sep 04, 2019 at 12:44:20PM +0900, Masahiko Sawada wrote: > I forgot to include some new header files. Attached the updated patches. No reviews since and the patch does not apply anymore. I am moving it to next CF, waiting on author. -- Michael signature.asc Description: PGP signature
Re: Undo logs
On Thu, Feb 07, 2019 at 07:35:31AM +0530, Andres Freund wrote: > It was JUST added ... :) thought I saw you reply on the other thread > about it, but I was wrong... Six months later without any activity, I am marking this entry as returned with feedback. The latest patch set does not apply anymore, so having a rebase would be nice if submitted again. -- Michael signature.asc Description: PGP signature
Re: Auxiliary Processes and MyAuxProc
On Thu, Oct 03, 2019 at 11:39:37AM -0700, Andres Freund wrote: > Color me unconvinced. The latest comments of the thread have not been addressed yet. so I am marking the patch as returned with feedback. If you think that's not correct, please feel free to update the status of the patch. If you do so, please provide at the same time a rebased version of the patch, as it is failing to apply on HEAD. -- Michael signature.asc Description: PGP signature
Re: shared-memory based stats collector
On Fri, Sep 27, 2019 at 09:46:47AM +0900, Kyotaro Horiguchi wrote: > Affected by the code movement in 9a86f03b4e. Just > rebased. Thanks. This does not apply anymore. Could you provide a rebase? I have moved the patch to next CF, waiting on author. Thanks, -- Michael signature.asc Description: PGP signature
Re: Global shared meta cache
On Wed, Nov 06, 2019 at 02:55:30AM +, ideriha.take...@fujitsu.com wrote: > Thank you for the reply. Latest patch does not apply. Please send a rebase. Patch moved to next CF, waiting on author. Bip. -- Michael signature.asc Description: PGP signature
Re: pause recovery if pitr target not reached
On Fri, Nov 22, 2019 at 11:26:59AM +, Leif Gunnar Erlandsen wrote: > No it does not. It works well to demonstrate its purpose though. > And it might be to stop with FATAL would be more correct. This is still under active discussion. Please note that the latest patch does not apply, so a rebase would be nice to have. I have moved the patch to next CF, waiting on author. -- Michael signature.asc Description: PGP signature
Re: Yet another fast GiST build
On Sun, Sep 08, 2019 at 01:54:35PM +0500, Andrey Borodin wrote: > Here's V3 of the patch set. > Changes: > 1. Added some documentation of new sort support routines > 2. Fixed bug with dirty pages > > I did not add sort support procs to built-in boxes, circles and > polys, since it may be not optimal index for them. However, for > points Z-order is quite good as a defaul t. The latest patch does not apply. Could you send a rebase? I have moved the patch to next CF, waiting on author for now. -- Michael signature.asc Description: PGP signature
Re: Yet another vectorized engine
On Thu, Nov 28, 2019 at 05:23:59PM +0800, Hubert Zhang wrote: > Note that the vectorized executor engine is based on PG9.6 now, but it > could be ported to master / zedstore with some effort. We would appreciate > some feedback before moving further in that direction. There has been no feedback yet, unfortunately. The patch does not apply anymore, so a rebase is necessary. For now I am moving the patch to next CF, waiting on author. -- Michael signature.asc Description: PGP signature
Re: Parallel grouping sets
On Thu, Nov 28, 2019 at 07:07:22PM +0800, Pengzhou Tang wrote: > Richard pointed out that he get incorrect results with the patch I > attached, there are bugs somewhere, > I fixed them now and attached the newest version, please refer to [1] for > the fix. Mr Robot is reporting that the latest patch fails to build at least on Windows. Could you please send a rebase? I have moved for now the patch to next CF, waiting on author. -- Michael signature.asc Description: PGP signature
Re: Copy data to DSA area
On Fri, Oct 18, 2019 at 12:53:16PM +, ideriha.take...@fujitsu.com wrote: > I added ShmZoneContext to my patch. > I haven't added detailed comments and test set, so let me explain how to > use it here. I followed Thomas' suggestion. The latest patch sent fails to apply. Could you please send a rebase? I am moving this patch to next CF, waiting on author. -- Michael signature.asc Description: PGP signature
Re: Runtime pruning problem
On Thu, Sep 12, 2019 at 10:24:13AM -0400, Tom Lane wrote: > It's on my to-do list, but I'm not sure how soon I'll get to it. Seems like it is better to mark this CF entry as returned with feedback then. -- Michael signature.asc Description: PGP signature
Re: Global temporary tables
On Wed, Nov 20, 2019 at 07:32:14PM +0300, Konstantin Knizhnik wrote: > Now pg_gtt_statistic view is provided for global temp tables. Latest patch fails to apply, per Mr Robot's report. Could you please rebase and send an updated version? For now I have moved the patch to next CF, waiting on author. -- Michael signature.asc Description: PGP signature
Re: WIP: BRIN multi-range indexes
On Thu, Sep 26, 2019 at 09:01:48PM +0200, Tomas Vondra wrote: > Yeah, the opclass params patches got broken by 773df883e adding enum > reloptions. The breakage is somewhat extensive so I'll leave it up to > Nikita to fix it in [1]. Until that happens, apply the patches on > top of caba97a9d9 for review. This has been close to two months now, so I have the patch as RwF. Feel free to update if you think that's incorrect. -- Michael signature.asc Description: PGP signature
Re: Allow 'sslkey' and 'sslcert' in postgres_fdw user mappings
On Thu, Oct 31, 2019 at 07:54:41PM -0400, Andrew Dunstan wrote: > This patch achieves $SUBJECT and also provides some testing of the > sslpassword setting. The patch does not apply anymore, so a rebase is needed. As it has not been reviewed, I am moving it to next CF, waiting on author. -- Michael signature.asc Description: PGP signature
Re: Binary support for pgoutput plugin
Hi, On Mon, Nov 11, 2019 at 03:24:59PM -0500, Dave Cramer wrote: > Attached, The latest patch set does not apply correctly. Could you send a rebase please? I am moving the patch to next CF, waiting on author. -- Michael signature.asc Description: PGP signature
Re: Should we add xid_current() or a int8->xid cast?
On Sun, Dec 1, 2019 at 12:22 PM Mark Dilger wrote: > These two patches (v3) no longer apply cleanly. Could you please > rebase? Hi Mark, Thanks. Here's v4. 0001-Add-SQL-type-xid8-to-expose-FullTransactionId-to--v4.patch Description: Binary data 0002-Introduce-xid8-variants-of-the-txid_XXX-fmgr-func-v4.patch Description: Binary data
Protocol problem with GSSAPI encryption?
This came up recently on IRC, not sure if the report there was passed on at all. ProcessStartupPacket assumes that there will be only one negotiation request for an encrypted connection, but libpq is capable of issuing two: it will ask for GSS encryption first, if it looks like it will be able to do GSSAPI, and if the server refuses that it will ask (on the same connection) for SSL. But ProcessStartupPacket assumes that the packet after a failed negotiation of either kind will be the actual startup packet, so the SSL connection request is rejected with "unsupported version 1234.5679". I'm guessing this usually goes unnoticed because most people are probably not set up to do GSSAPI, and those who are are probably ok with using it for encryption. But if the client is set up for GSSAPI and the server not, then trying to do an SSL connection will fail when it should succeed, and PGGSSENCMODE=disable in the environment (or connect string) is necessary to get the connection to succeed. It seems to me that this is a bug in ProcessStartupPacket, which should accept both GSS or SSL negotiation requests on a connection (in either order). Maybe secure_done should be two flags rather than one? I'm not really familiar with the GSSAPI stuff so probably someone who is should take a look. -- Andrew (irc:RhodiumToad)
Re: [Patch] Add a reset_computed_values function in pg_stat_statements
On Tue, Nov 05, 2019 at 11:03:58AM +1300, Thomas Munro wrote: > In contrib/pg_stat_statements/pg_stat_statements.c, you need to > declare or define entry_reset_computed() before you use it. I suppose > your compiler must be warning about that. I personally put > "COPT=-Wall -Werror" into src/Makefile.custom to make sure that I > don't miss any warnings. That's also what http://cfbot.cputube.org/ > does (a thing that does a full check-world on all registered patches > to look for such problems). Still the same problem which has not been addressed after a couple of weeks, so I am marking the entry as returned with feedback. -- Michael signature.asc Description: PGP signature
Re: [Proposal] Add accumulated statistics
On Wed, Oct 30, 2019 at 05:55:28AM +, imai.yoshik...@fujitsu.com wrote: > And here is the patch which counts the wait event and measuring the wait > event time. It is currently like POC and has several things to be improved. Please note the patch tester complains about the latest patch: pgstatfuncs.c: In function ‘pg_stat_get_waitaccum’: pgstatfuncs.c:2018:3: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] Datum values[PG_STAT_GET_WAITACCUM_COLS]; I am moving it to next CF, marking it as waiting on author. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Fri, Nov 29, 2019 at 03:01:46PM +0900, Michael Paquier wrote: On Sun, Sep 29, 2019 at 01:00:49AM +0200, Tomas Vondra wrote: OK. I'll try extending the set of synthetic queries in [1] to also do soemthing like this and generate similar plans. Any progress on that? Please note that the latest patch does not apply anymore, so a rebase is needed. I am switching the patch as waiting on author for now. -- Ah, thanks for reminding me. I've added a couple more queries with two joins (there only were queries with two joins, I haven't expected another joint to make such difference, but seems I was wrong). So yes, there seem to be 6 different GUCs / places where considering incremental sort makes a difference (the numbers say how many of the 4960 tested combinations were affected) - create_ordered_paths_parallel (50) - create_partial_grouping_paths_2 (228) - standard_join_search (94) - add_paths_to_grouping_rel (2148) - set_rel_pathlist (156) - apply_scanjoin_target_to_paths (286) Clearly some of the places are more important than others, plus there are some overlaps (two GUCs producing the same plan, etc.). Plus there are four GUCs that did not affect any queries at all: - create_partial_grouping_paths - gather_grouping_paths - create_ordered_paths - add_paths_to_grouping_rel_parallel Anyway, this might serve as a way to prioritize the effort. All the test changes are in the original repo at https://github.com/tvondra/incremental-sort-tests-2 and I'm also attaching the rebased patches - the changes were pretty minor, hopefully that helps others (all the patches with dev GUCs are in https://github.com/tvondra/postgres/tree/incremental-sort-20191129 regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 14beb5a9f3282d452844cffa86bafb59df831343 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Fri, 29 Nov 2019 19:41:26 +0100 Subject: [PATCH 01/13] Consider low startup cost when adding partial path 45be99f8cd5d606086e0a458c9c72910ba8a613d added `add_partial_path` with the comment: > Neither do we need to consider startup costs: > parallelism is only used for plans that will be run to completion. > Therefore, this routine is much simpler than add_path: it needs to > consider only pathkeys and total cost. I'm not entirely sure if that is still true or not--I can't easily come up with a scenario in which it's not, but I also can't come up with an inherent reason why such a scenario cannot exist. Regardless, the in-progress incremental sort patch uncovered a new case where it definitely no longer holds, and, as a result a higher cost plan ends up being chosen because a low startup cost partial path is ignored in favor of a lower total cost partial path and a limit is a applied on top of that which would normal favor the lower startup cost plan. --- src/backend/optimizer/util/pathnode.c | 47 ++- 1 file changed, 18 insertions(+), 29 deletions(-) diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 60c93ee7c5..f24ba587e5 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -777,41 +777,30 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path) /* Unless pathkeys are incompatible, keep just one of the two paths. */ if (keyscmp != PATHKEYS_DIFFERENT) { - if (new_path->total_cost > old_path->total_cost * STD_FUZZ_FACTOR) - { - /* New path costs more; keep it only if pathkeys are better. */ - if (keyscmp != PATHKEYS_BETTER1) - accept_new = false; - } - else if (old_path->total_cost > new_path->total_cost -* STD_FUZZ_FACTOR) + PathCostComparison costcmp; + + /* +* Do a fuzzy cost comparison with standard fuzziness limit. +*/ + costcmp = compare_path_costs_fuzzily(new_path, old_path, + STD_FUZZ_FACTOR); + + if (costcmp == COSTS_BETTER1) { - /* Old path costs more; keep it only if pathkeys are better. */ - if (keyscmp != PATHKEYS_BETTER2) + if (keyscmp == PATHKEYS_BETTER1) remove_old = true; } - else if (keyscmp == PATHKEYS_BETTER1) - { - /* Costs are about the same, new path has better pathkeys. */ - remove_old =
Re: Should we add xid_current() or a int8->xid cast?
On 11/3/19 2:43 PM, Thomas Munro wrote: On Tue, Oct 29, 2019 at 5:23 PM btfujiitkp wrote: Thomas Munro writes: On Sun, Sep 1, 2019 at 5:04 PM Thomas Munro wrote: Adding to CF. Rebased. An OID clashed so re-roll the dice. Also spotted a typo. I have some questions in this code. Thanks for looking at the patch. First, "FullTransactionIdPrecedes(xmax, val)" is not equal to "val >= xmax" of the previous code. "FullTransactionIdPrecedes(xmax, val)" expresses "val > xmax". Is it all right? @@ -384,15 +324,17 @@ parse_snapshot(const char *str) while (*str != '\0') { /* read next value */ - val = str2txid(str, ); + val = FullTransactionIdFromU64(pg_strtouint64(str, , 10)); str = endp; /* require the input to be in order */ - if (val < xmin || val >= xmax || val < last_val) + if (FullTransactionIdPrecedes(val, xmin) || + FullTransactionIdPrecedes(xmax, val) || + FullTransactionIdPrecedes(val, last_val)) In addition to it, as to current TransactionId(not FullTransactionId) comparison, when we express ">=" of TransactionId, we use "TransactionIdFollowsOrEquals". this method is referred by some methods. On the other hand, FullTransactionIdFollowsOrEquals has not implemented yet. So, how about implementing this method? Good idea. I added the missing variants: +#define FullTransactionIdPrecedesOrEquals(a, b) ((a).value <= (b).value) +#define FullTransactionIdFollows(a, b) ((a).value > (b).value) +#define FullTransactionIdFollowsOrEquals(a, b) ((a).value >= (b).value) Second, About naming rule, "8" of xid8 means 8 bytes, but "8" has different meaning in each situation. For example, int8 of PostgreSQL means 8 bytes, int8 of C language means 8 bits. If 64 is used, it just means 64 bits. how about xid64()? In C, the typenames use bits, by happy coincidence similar to the C99 stdint.h typenames (int32_t etc) that we should perhaps eventually switch to. In SQL, the types have names based on the number of bytes: int2, int4, int8, float4, float8, not conforming to any standard but established over 3 decades ago and also understood by a few other SQL systems. That's unfortunate, but I can't see that ever changing. I thought that it would make most sense for the SQL type to be called xid8, though admittedly it doesn't quite fit the pattern because xid is not called xid4. There is another example a bit like that: macaddr (6 bytes) and macaccdr8 (8 bytes). As for the C type, we use TransactionId and FullTransactionId (rather than, say, xid32 and xid64). In the attached I also took Tom's advice and used unused_oids script to pick random OIDs >= 8000 for all new objects (ignoring nearby comments about the range of OIDs used in different sections of the file). These two patches (v3) no longer apply cleanly. Could you please rebase? -- Mark Dilger
Re: Using multiple extended statistics for estimates
On 11/14/19 12:04 PM, Tomas Vondra wrote: On Thu, Nov 14, 2019 at 10:23:44AM -0800, Mark Dilger wrote: On 11/14/19 7:55 AM, Tomas Vondra wrote: On Wed, Nov 13, 2019 at 10:04:36AM -0800, Mark Dilger wrote: On 11/13/19 7:28 AM, Tomas Vondra wrote: Hi, here's an updated patch, with some minor tweaks based on the review and added tests (I ended up reworking those a bit, to make them more like the existing ones). Thanks, Tomas, for the new patch set! Attached are my review comments so far, in the form of a patch applied on top of yours. Thanks. 1) It's not clear to me why adding 'const' to the List parameters would be useful? Can you explain? When I first started reviewing the functions, I didn't know if those lists were intended to be modified by the function. Adding 'const' helps document that the function does not intend to change them. Hmmm, ok. I'll think about it, but we're not really using const* in this way very much I think - at least not in the surrounding code. 2) I think you're right we can change find_strongest_dependency to do /* also skip weaker dependencies when attribute count matches */ if (strongest->nattributes == dependency->nattributes && strongest->degree >= dependency->degree) continue; That'll skip some additional dependencies, which seems OK. 3) It's not clear to me what you mean by * TODO: Improve this code comment. Specifically, why would we * ignore that no rows will match? It seems that such a discovery * would allow us to return an estimate of 0 rows, and that would * be useful. added to dependencies_clauselist_selectivity. Are you saying we should also compute selectivity estimates for individual clauses and use Min() as a limit? Maybe, but that seems unrelated to the patch. I mean that the comment right above that TODO is hard to understand. You seem to be saying that it is good and proper to only take the selectivity estimate from the final clause in the list, but then go on to say that other clauses might prove that no rows will match. So that implies that by ignoring all but the last clause, we're ignoring such other clauses that prove no rows can match. But why would we be ignoring those? I am not arguing that your code is wrong. I'm just critiquing the hard-to-understand phrasing of that code comment. Aha, I think I understand now - thanks for the explanation. You're right the comment is trying to explain why just taking the last clause for a given attnum is fine. I'll try to make the comment clearer. Are you planning to submit a revised patch for this? -- Mark Dilger
Re: Make autovacuum sort tables in descending order of xid_age
On Sat, Nov 30, 2019 at 10:04:07AM -0800, Mark Dilger wrote: > On 11/29/19 2:21 PM, David Fetter wrote: > > On Fri, Nov 29, 2019 at 07:01:39PM +0100, David Fetter wrote: > > > Folks, > > > > > > Per a suggestion Christophe made, please find attached a patch to > > > $Subject: > > > > > > Apart from carefully fudging with pg_resetwal, and short of running in > > > production for a few weeks, what would be some good ways to test this? > > > > Per discussion on IRC with Sehrope Sarkuni, please find attached a > > patch with one fewer bug, this one in the repalloc() calls. > > Hello David, > > Here are my initial thoughts. > > Although you appear to be tackling the problem of vacuuming tables > with older Xids first *per database*, Yes, that's what's come up for me in production, but lately, production has consisted of a single active DB maxing out hardware. I can see how in other situations--multi-tenant, especially--it would make more sense to sort the DBs first. > have you considered changing the logic in building and sorting the > database list in get_database_list and rebuild_database_list? I'm > just curious what your thoughts might be on this subject. I hadn't, but now that you mention it, it seems like a reasonable thing to try. > As far as sorting the list of tables in an array and then copying > that array into a linked list, I think there is no need. The > copying of table_ages into table_oids is followed immediately by > > foreach(cell, table_oids) > > and then table_oids seems not to serve any further purpose. Perhaps > you can just iterate over table_ages directly and avoid the extra > copying. I hadn't looked toward any optimizations in this section, given that the vacuums in question can take hours or days, but I can see how that would make the code cleaner, so please find that change attached. > I have not tested this change, but I may do so later today or perhaps > on Monday. Thanks for looking at this! Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 16aca9efcf1f9402f80acd70701815990327e956 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Wed, 27 Nov 2019 23:50:48 -0800 Subject: [PATCH v3] Autovacuum tables in descending order of age To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.23.0" This is a multi-part message in MIME format. --2.23.0 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index c1dd8168ca..038b3f1272 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -288,6 +288,15 @@ typedef struct AutoVacuumWorkItem av_workItems[NUM_WORKITEMS]; } AutoVacuumShmemStruct; +/* + * Struct for deciding which tables to vacuum first + */ +typedef struct +{ + uint32 relfrozenxid_age; + Oid table_oid; +} RelFrozenXidAge; + static AutoVacuumShmemStruct *AutoVacuumShmem; /* @@ -319,6 +328,7 @@ static void rebuild_database_list(Oid newdb); static int db_comparator(const void *a, const void *b); static void autovac_balance_cost(void); +static int rfxa_comparator(const void *a, const void *b); static void do_autovacuum(void); static void FreeWorkerInfo(int code, Datum arg); @@ -1936,7 +1946,9 @@ do_autovacuum(void) HeapTuple tuple; TableScanDesc relScan; Form_pg_database dbForm; - List *table_oids = NIL; + RelFrozenXidAge *table_ages; + int table_ages_size = 1024; + int table_ages_count = 0; List *orphan_oids = NIL; HASHCTL ctl; HTAB *table_toast_map; @@ -1951,6 +1963,7 @@ do_autovacuum(void) bool found_concurrent_worker = false; int i; + table_ages = (RelFrozenXidAge *)palloc(table_ages_size * sizeof(RelFrozenXidAge)); /* * StartTransactionCommand and CommitTransactionCommand will automatically * switch to other contexts. We need this one to keep the list of @@ -2102,9 +2115,22 @@ do_autovacuum(void) effective_multixact_freeze_max_age, , , ); - /* Relations that need work are added to table_oids */ + /* Relations that need work are added to table_ages */ if (dovacuum || doanalyze) - table_oids = lappend_oid(table_oids, relid); + { + RelFrozenXidAge rfxa; + rfxa.relfrozenxid_age = DirectFunctionCall1(xid_age, + classForm->relfrozenxid); + rfxa.table_oid = relid; + if (table_ages_count == table_ages_size) + { +table_ages_size *= 2; +table_ages = (RelFrozenXidAge *)repalloc(table_ages, table_ages_size * sizeof(RelFrozenXidAge)); + } + table_ages[table_ages_count] = rfxa; + table_ages_count++; + } + /* * Remember TOAST associations for the second pass. Note: we must do @@ -2187,7 +2213,20 @@ do_autovacuum(void) /* ignore analyze for toast tables */ if (dovacuum) - table_oids = lappend_oid(table_oids,
Re: [HACKERS] Block level parallel vacuum
On Sat, 30 Nov 2019 at 19:18, Sergei Kornilov wrote: > Hello > > Its possible to change order of index processing by parallel leader? In > v35 patchset I see following order: > - start parallel processes > - leader and parallel workers processed index lixt and possible skip some > entries > - after that parallel leader recheck index list and process the skipped > indexes > - WaitForParallelWorkersToFinish > > I think it would be better to: > - start parallel processes > - parallel leader goes through index list and process only indexes which > are skip_parallel_index_vacuum = true > - parallel workers processes indexes with skip_parallel_index_vacuum = > false > - parallel leader start participate with remainings parallel-safe index > processing > - WaitForParallelWorkersToFinish > > This would be less running time and better load balance across leader and > workers in case of few non-parallel and few parallel indexes. > (if this is expected and required by some reason, we need a comment in > code) > > Also few notes to vacuumdb: > Seems we need version check at least in vacuum_one_database and > prepare_vacuum_command. Similar to SKIP_LOCKED or DISABLE_PAGE_SKIPPING > features. > discussion question: difference between --parallel and --jobs parameters > will be confusing? We need more description for this options > While doing testing with different server configuration settings, I am getting error (ERROR: no unpinned buffers available) in parallel vacuum but normal vacuum is working fine. *Test Setup*: max_worker_processes = 40 autovacuum = off shared_buffers = 128kB max_parallel_workers = 40 max_parallel_maintenance_workers = 40 vacuum_cost_limit = 2000 vacuum_cost_delay = 10 *Table description: *table have 16 indexes(14 btree, 1 hash, 1 BRIN ) and total 10,00,000 tuples and I am deleting all the tuples, then firing vacuum command. Run attached .sql file (test_16_indexes.sql) $ ./psql postgres postgres=# \i test_16_indexes.sql Re-start the server and do vacuum. Case 1) normal vacuum: postgres=# vacuum test ; VACUUM Time: 115174.470 ms (01:55.174) Case 2) parallel vacuum using 10 parallel workers: postgres=# vacuum (parallel 10)test ; ERROR: no unpinned buffers available CONTEXT: parallel worker postgres=# This error is coming due to 128kB shared buffer. I think, I launched 10 parallel workers and all are working paralleling so due to less shared buffer, I am getting this error. Is this expected behavior with small shared buffer size or we should try to come with a solution for this. Please let me know your thoughts. Thanks and Regards Mahendra Thalor EnterpriseDB: http://www.enterprisedb.com test_16_indexes.sql Description: Binary data
Re: pgbench -i progress output on terminal
I wrote the v1 patch on CentOS Linux, and now on MacOS. It would be great if someone can volunteer to test on Windows terminal. I do not have that. Attached v3. Patch applies, compiles, works for me. No further comments. I switched the patch as ready. -- Fabien.
Re: Append with naive multiplexing of FDWs
On Sun, Nov 17, 2019 at 09:54:55PM +1300, Thomas Munro wrote: > On Sat, Sep 28, 2019 at 4:20 AM Bruce Momjian wrote: > > On Wed, Sep 4, 2019 at 06:18:31PM +1200, Thomas Munro wrote: > > > A few years back[1] I experimented with a simple readiness API that > > > would allow Append to start emitting tuples from whichever Foreign > > > Scan has data available, when working with FDW-based sharding. I used > > > that primarily as a way to test Andres's new WaitEventSet stuff and my > > > kqueue implementation of that, but I didn't pursue it seriously > > > because I knew we wanted a more ambitious async executor rewrite and > > > many people had ideas about that, with schedulers capable of jumping > > > all over the tree etc. > > > > > > Anyway, Stephen Frost pinged me off-list to ask about that patch, and > > > asked why we don't just do this naive thing until we have something > > > better. It's a very localised feature that works only between Append > > > and its immediate children. The patch makes it work for postgres_fdw, > > > but it should work for any FDW that can get its hands on a socket. > > > > > > Here's a quick rebase of that old POC patch, along with a demo. Since > > > 2016, Parallel Append landed, but I didn't have time to think about > > > how to integrate with that so I did a quick "sledgehammer" rebase that > > > disables itself if parallelism is in the picture. > > > > Yes, sharding has been waiting on parallel FDW scans. Would this work > > for parallel partition scans if the partitions were FDWs? > > Yeah, this works for partitions that are FDWs (as shown), but only for > Append, not for Parallel Append. So you'd have parallelism in the > sense that your N remote shard servers are all doing stuff at the same > time, but it couldn't be in a parallel query on your 'home' server, > which is probably good for things that push down aggregation and bring > back just a few tuples from each shard, but bad for anything wanting > to ship back millions of tuples to chew on locally. Do you think > that'd be useful enough on its own? Yes, I think so. There are many data warehouse queries that want to return only aggregate values, or filter for a small number of rows. Even OLTP queries might return only a few rows from multiple partitions. This would allow for a proof-of-concept implementation so we can see how realistic this approach is. > The problem is that parallel safe non-partial plans (like postgres_fdw > scans) are exclusively 'claimed' by one process under Parallel Append, > so with the patch as posted, if you modify it to allow parallelism > then it'll probably give correct answers but nothing prevents a single > process from claiming and starting all the scans and then waiting for > them to be ready, while the other processes miss out on doing any work > at all. There's probably some kludgy solution involving not letting > any one worker start more than X, and some space cadet solution > involving passing sockets around and teaching libpq to hand over > connections at certain controlled phases of the protocol (due to lack > of threads), but nothing like that has jumped out as the right path so > far. I am unclear how many queries can do any meaningful work until all shards have giving their full results. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Make autovacuum sort tables in descending order of xid_age
On 11/29/19 2:21 PM, David Fetter wrote: On Fri, Nov 29, 2019 at 07:01:39PM +0100, David Fetter wrote: Folks, Per a suggestion Christophe made, please find attached a patch to $Subject: Apart from carefully fudging with pg_resetwal, and short of running in production for a few weeks, what would be some good ways to test this? Per discussion on IRC with Sehrope Sarkuni, please find attached a patch with one fewer bug, this one in the repalloc() calls. Hello David, Here are my initial thoughts. Although you appear to be tackling the problem of vacuuming tables with older Xids first *per database*, have you considered changing the logic in building and sorting the database list in get_database_list and rebuild_database_list? I'm just curious what your thoughts might be on this subject. As far as sorting the list of tables in an array and then copying that array into a linked list, I think there is no need. The copying of table_ages into table_oids is followed immediately by foreach(cell, table_oids) and then table_oids seems not to serve any further purpose. Perhaps you can just iterate over table_ages directly and avoid the extra copying. I have not tested this change, but I may do so later today or perhaps on Monday. -- Mark Dilger
Re: [HACKERS] Block level parallel vacuum
Hello Its possible to change order of index processing by parallel leader? In v35 patchset I see following order: - start parallel processes - leader and parallel workers processed index lixt and possible skip some entries - after that parallel leader recheck index list and process the skipped indexes - WaitForParallelWorkersToFinish I think it would be better to: - start parallel processes - parallel leader goes through index list and process only indexes which are skip_parallel_index_vacuum = true - parallel workers processes indexes with skip_parallel_index_vacuum = false - parallel leader start participate with remainings parallel-safe index processing - WaitForParallelWorkersToFinish This would be less running time and better load balance across leader and workers in case of few non-parallel and few parallel indexes. (if this is expected and required by some reason, we need a comment in code) Also few notes to vacuumdb: Seems we need version check at least in vacuum_one_database and prepare_vacuum_command. Similar to SKIP_LOCKED or DISABLE_PAGE_SKIPPING features. discussion question: difference between --parallel and --jobs parameters will be confusing? We need more description for this options? regards, Sergei
Re: log bind parameter values on error
Hi, I'm sorry for replying so late. I don't think those really are contradictions. You can continue to have an errdetail_params(), and but call it from the error context callback set up in the portal code ... Even leaving that aside, I'm *STRONGLY* against entangling elog.c with query execution details like ParamListInfo. We should work to make elog.c know less about different parts of the system, not more. I've implemented it using error contexts, please could you have a look at the patch attached? I did not use the PG_TRY blocks in portal code because they are not wide enough. Otherwise, we wouldn't catch errors that can happen in GetCachedPlan. Instead I added an item to error context chain in postgres.c exec_*_message routines. I can't find why it could be unsafe, but I'm still a bit worried about it as there's no other error context changes that early in the call stack. I've also made string formatting more efficient, and pre-calculate the whole string instead of individual values as suggested. Unfortunately it still copies the error text when reporting -- I'm not sure how I change it without substantial change of error logging infrastructure. One more concern I'd like to ask for opinions. I'm using errdetail_log for parameters logging, as it's the only way to make it appear in the log only but not get sent to the client. It looks a bit awkward, as it appears like ERROR DETAIL CONTEXT STATEMENT, where CONTEXT may in fact be something inner nested than the parameters that appear in DETAILS. With this regards, I'm thinking of adding some special arrangements into elog.c. One idea is a econtext_log to add a log-only data into CONTEXT, however it'll still log params above the statement, although directly above. Another option is a special estatementparams (or perhaps efinalcontext_log?) to log below the statement. What do you think? 2 Alvaro: So, if some parameters are large (they can be up to 1 GB-1, remember) then we can bloat the log file severely. I think we need to place an upper limit on the strings that we're going to log -- as inspiration, callers of ExecBuildValueDescription uses 64 chars per value maximum. Something like that seems reasonable. So I think you need to add some pg_mbcliplen() calls in a couple of places in exec_bind_message. I like the idea, but I don't think it's directly related to the change proposed. We are already logging parameters in certain situations with no limits on their lenghts. Given the time it takes to get the change through (not least because of me addressing reviewers' comments really slowly) I'd not include it in the patch. Do you find it acceptable? Best, Alex diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 4ec13f3311..b3a0d27861 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -6597,6 +6597,29 @@ log_line_prefix = '%m [%p] %q%u@%d/%a ' + + log_parameters_on_error (boolean) + + log_parameters_on_error configuration parameter + + + + +Controls whether bind parameters are logged when a statement is logged +as a result of . +It adds some overhead, as postgres will compute and store textual +representations of parameter values in memory for all statements, +even if they eventually do not get logged. +This setting has no effect on statements logged due to + or + settings, as they are always logged +with parameters. +The default is off. +Only superusers can change this setting. + + + + log_statement (enum) diff --git a/src/backend/nodes/params.c b/src/backend/nodes/params.c index cf4387e40f..7c1f86f5c5 100644 --- a/src/backend/nodes/params.c +++ b/src/backend/nodes/params.c @@ -15,12 +15,16 @@ #include "postgres.h" +#include "lib/stringinfo.h" #include "nodes/bitmapset.h" #include "nodes/params.h" #include "storage/shmem.h" #include "utils/datum.h" +#include "utils/guc.h" #include "utils/lsyscache.h" +#include "utils/memutils.h" +static void LogParams(void *arg); /* * Allocate and initialize a new ParamListInfo structure. @@ -44,6 +48,7 @@ makeParamList(int numParams) retval->paramCompileArg = NULL; retval->parserSetup = NULL; retval->parserSetupArg = NULL; + retval->logString = NULL; retval->numParams = numParams; return retval; @@ -58,6 +63,8 @@ makeParamList(int numParams) * set of parameter values. If dynamic parameter hooks are present, we * intentionally do not copy them into the result. Rather, we forcibly * instantiate all available parameter values and copy the datum values. + * + * We also don't copy logString. */ ParamListInfo copyParamList(ParamListInfo from) @@ -221,6 +228,8 @@ SerializeParamList(ParamListInfo paramLI, char **start_address) * set of parameter values. If dynamic parameter hooks are present, we * intentionally do not copy them
Re: pgbench -i progress output on terminal
Hi Fabien, Thanks for taking a look again. On Sat, Nov 30, 2019 at 4:28 PM Fabien COELHO wrote: > > I have updated the patch based on these observations. Attached v2. > > Patch v2 applies & compiles cleanly, works for me. > > I'm not partial to Hungarian notation conventions, which is not widely > used elsewhere in pg. I'd suggest eolchar -> eol or line_end or whatever, > but others may have different opinion. Maybe having a char variable is a > rare enough occurence which warrants advertising it. On second thought, I'm fine with just eol. > Maybe use fputc instead of fprintf in the closing output? OK, done. > I'm unsure about what happens on MacOS and Windows terminal, but if it > works for other commands progress options, it is probably all right. I wrote the v1 patch on CentOS Linux, and now on MacOS. It would be great if someone can volunteer to test on Windows terminal. Attached v3. Thanks, Amit compactify-pgbench-init-progress-output_v3.patch Description: Binary data