Re: feature request ctid cast / sql exception
This is a specific use case, I have a big table without a pk. Updates with ctid are blazing fast even without an index. I dont need it. The argument behind this is that users excpect this functionality, its not just me. Search stackoverflow. They end up using various suboptimal solutions as I described earlier. This is a very very simple functionality so please consider it. Im also writing an opensource lib that would make use of this. My users will be thankfull to you. On Sat, Apr 17, 2021, 23:05 David G. Johnston wrote: > On Sat, Apr 17, 2021 at 12:58 PM Vladimír Houba ml. > wrote: > >> I use ctid as a row identifier within a transaction in a Java application. >> > > This doesn't present a very compelling argument since an actual user > declared primary key is what is expected to be used as a row identifier. > And as those are typically bigint if you follow this norm you get exactly > what you say you need. > > David J. > >
回复: Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10));
After several tests, I found that this patch do not fix the bug well. I think we should use the same logic to treat parent CollateExpr and child CollateExpr. In your patch, if the parent node is CollateExpr and the target type is non-collatable, we coerce CollateExpr->arg. If the child node is CollateExpr and the target type is non-collatable, we just skip. Some types can be casted to another type even if type_is_collatable returns false. Like bytea to int (It depends on the content of the string). If we simply skip, bytea will never be casted to int even if the content is all digits. So the attachment is my patch and it works well as far as I tested. 发件人: Tom Lane 发送时间: 2021年4月13日 0:59 收件人: Yulin PEI 抄送: pgsql-hackers@lists.postgresql.org 主题: Re: Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10)); Yulin PEI writes: > I found it could cause a crash when executing sql statement: `CREATE VIEW > v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10)); ` in > postgres 13.2 release. Nice catch. I don't think the code in DefineVirtualRelation is wrong: exprCollation shouldn't report any collation for an expression of a non-collatable type. Rather the problem is with an old kluge in coerce_type(), which will push a type coercion underneath a CollateExpr ... without any mind for the possibility that the coercion result isn't collatable. So the right fix is more or less the attached. regards, tom lane fix-collation-coercion.patch Description: fix-collation-coercion.patch
Re: feature request ctid cast / sql exception
I use ctid as a row identifier within a transaction in a Java application. To obtain the row ctid I either have to - cast it to text and store it as String - cast it to string, then convert it to a bigint using UDF which is inefficient I wish I could just cast ctid to bigint and store it as a primitive long type. Regarding the exception throwing function it makes good sense for example in case blocks when you encouter unexpected value. IMHO "one fits all" solution may be making a raise function with the same syntax as raise statement in plpgsql. RAISE([ level ] 'format' [, expression [, ... ]] [ USING option = expression [, ... ] ]) RAISE([ level ] condition_name [ USING option = expression [, ... ] ]) RAISE([ level ] SQLSTATE 'sqlstate' [ USING option = expression [, ... ] ]) RAISE([ level ] USING option = expression [, ... ]) RAISE() On Sat, Apr 17, 2021 at 9:46 PM Tom Lane wrote: > "David G. Johnston" writes: > > On Sat, Apr 17, 2021 at 10:58 AM Vladimír Houba ml. > > wrote: > >> Another nice feature would be a function that can be called from a sql > >> statement and would throw an exception when executed. > > > An assertion-related extension in core would be welcomed. > > This has been suggested before, but as soon as you start looking > at the details you find that it's really hard to get a one-size-fits-all > definition that's any simpler than the existing plpgsql RAISE > functionality. Different people have different ideas about how > much decoration they want around the message. So, if 10% of the > world agrees with your choices and the other 90% keeps on using > a custom plpgsql function to do it their way, you haven't really > improved matters much. OTOH a 90% solution might be interesting to > incorporate in core, but nobody's demonstrated that one exists. > > regards, tom lane > -- S pozdravom Vladimír Houba ml.
Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb
On Fri, Apr 09, 2021 at 06:45:45PM -0400, Alvaro Herrera wrote: > We forgot this patch earlier in the commitfest. Do people think we > should still get it in on this cycle? I'm +1 on that, since it's a > safety feature poised to prevent more bugs than it's likely to > introduce. No objections from here to do that now even after feature freeze. I also wonder, while looking at that, why you don't just remove the last call within src/backend/catalog/heap.c. This way, nobody is tempted to use RelationOpenSmgr() anymore, and it could just be removed from rel.h. -- Michael signature.asc Description: PGP signature
Re: Remove redundant variable from transformCreateStmt
On Mon, Apr 19, 2021 at 11:05 AM Bharath Rupireddy wrote: > > On Mon, Apr 19, 2021 at 9:32 AM Amul Sul wrote: > > Kindly ignore the previous version -- has unnecessary change. > > See the attached. > > Thanks for the patch! > > How about a slight rewording of the added comment to "Constraints > validation can be skipped for a newly created table as it contains no > data. However, this is not necessarily true for a foreign table."? > Well, wording is quite subjective, let's leave this to the committer for the final decision, I don't see anything wrong with it. > You may want to add it to the commitfest if not done already. And I > don't think we need to backpatch this as it's not critical. This is not fixing anything so not a relevant candidate for the backporting. Regards, Amul
Re: Table refer leak in logical replication
On Mon, Apr 19, 2021 at 03:08:41PM +0900, Michael Paquier wrote: > FWIW, I > would be tempted to send back f1ac27b to the blackboard, then refactor > the code of the apply worker to use ExecInitResultRelation() so as we > get more consistency with resource releases, simplifying the business > with indexes. Once the code is in a cleaner state, we could come back > into making an integration with partitioned tables into this code. But you cannot do that either as f1ac27bf got into 13.. -- Michael signature.asc Description: PGP signature
Re: Table refer leak in logical replication
On Sat, Apr 17, 2021 at 07:02:00PM +0530, Amit Kapila wrote: > Hmm, I am not sure if it is a good idea to open indexes needlessly > especially when it is not done in the previous code. Studying the history of this code, I think that f1ac27b is to blame here for making the code of the apply worker much messier than it was before. Before that, we were at a point where we had to rely on one single ResultRelInfo with all its indexes opened and closed before doing the DML. After f1ac27b, the code becomes shaped so as the original ResultRelInfo may or may not be used depending on if this code is working on a partitioned table or not. With an UPDATE, not one but *two* ResultRelInfo may be used if a tuple is moved to a different partition. I think that in the long term, and if we want to make use of ExecInitResultRelation() in this area, we are going to need to split the apply code in two parts, roughly (easier to say in words than actually doing it, still): - Find out first which relations it is necessary to work on, and create a set of ResultRelInfo assigned to an executor state by ExecInitResultRelation(), doing all the relation openings that are necessary. The gets funky when attempting to do an update across partitions. - Do the actual DML, with all the relations already opened and ready for use. On top of that, I am really getting scared by the following, done in not one but now two places: /* * The tuple to be updated could not be found. * * TODO what to do here, change the log level to LOG perhaps? */ elog(DEBUG1, "logical replication did not find row for update " "in replication target relation \"%s\"", RelationGetRelationName(localrel)); This already existed in once place before f1ac27b, but this got duplicated in a second area when applying the first update to a partition table. The refactoring change done in 1375422c in worker.c without the tuple routing pieces would be a piece of cake in terms of relations that require to be opened and closed, including the timings of each call because they could be unified in single code paths, and I also guess that we would avoid leak issues really easily. If the tuple routing code actually does not consider the case of moving a tuple across partitions, the level of difficulty to do an integration with ExecInitResultRelation() is much more reduced, though it makes the feature much less appealing as it becomes much more difficult to do some data redistribution across a different set of partitions with logical changes. > I am not sure if it is a good idea to do the refactoring related to > indexes or other things to fix a minor bug in commit 1375422c. It > might be better to add a simple fix like what Hou-San has originally > proposed [1] because even using ExecInitResultRelation might not be > the best thing as it is again trying to open a range table which we > have already opened in logicalrep_rel_open. OTOH, using > ExecInitResultRelation might encapsulate the assignment we are doing > outside. Yeah, that would be nice to just rely on that. copyfrom.c does basically what I guess we should try to copy a maximum here. With a proper cleanup of the executor state using ExecCloseResultRelations() once we are done with the tuple apply. > In general, it seems doing bigger refactoring or changes > might lead to some bugs or unintended consequences, so if possible, we > can try such refactoring as a separate patch. One argument against the > proposed refactoring could be that with the previous code, we were > trying to open the indexes just before it is required but with the new > patch in some cases, they will be opened during the initial phase and > for other cases, they are opened just before they are required. It > might not necessarily be a bad idea to rearrange code like that but > maybe there is a better way to do that. > > [1] - > https://www.postgresql.org/message-id/OS0PR01MB571686F75FBDC219FF3DFF0D94769%40OS0PR01MB5716.jpnprd01.prod.outlook.com This feels like piling one extra hack on top of what looks like an abuse of the executor calls to me, and the apply code is already full of it. True that we do that in ExecuteTruncateGuts() for allow triggers to be fired, but I think that it would be better to avoid spread that to consolidate the trigger and execution code. FWIW, I would be tempted to send back f1ac27b to the blackboard, then refactor the code of the apply worker to use ExecInitResultRelation() so as we get more consistency with resource releases, simplifying the business with indexes. Once the code is in a cleaner state, we could come back into making an integration with partitioned tables into this code. -- Michael signature.asc Description: PGP signature
Windows default locale vs initdb
Hi, Moving this topic into its own thread from the one about collation versions, because it concerns pre-existing problems, and that thread is long. Currently initdb sets up template databases with old-style Windows locale names reported by the OS, and they seem to have caused us quite a few problems over the years: db29620d "Work around Windows locale name with non-ASCII character." aa1d2fc5 "Another attempt at fixing Windows Norwegian locale." db477b69 "Deal with yet another issue related to "Norwegian (Bokmål)"..." 9f12a3b9 "Tolerate version lookup failure for old style Windows locale..." ... and probably more, and also various threads about , for example, "German_German.1252" vs "German_Switzerland.1252" which seem to get confused or badly canonicalised or rejected somewhere in the mix. I hadn't focused on any of that before, being a non-Windows-user, but the entire contents of win32setlocale.c supports the theory that Windows' manual meant what it said when it said[1]: "We do not recommend this form for locale strings embedded in code or serialized to storage, because these strings are more likely to be changed by an operating system update than the locale name form." I suppose that was the only form available at the time the code was written, so there was no choice. The question we asked ourselves multiple times in the other thread was how we're supposed to get to the modern BCP 47 form when creating the template databases. It looks like one possibility, since Vista, is to call GetUserDefaultLocaleName()[2], which doesn't appear to have been discussed before on this list. That doesn't allow you to ask for the default for each individual category, but I don't know if that is even a concept for Windows user settings. It may be that some of the other nearby functions give a better answer for some reason. But one thing is clear from a test that someone kindly ran for me: it reports standardised strings like "en-NZ", not strings like "English_New Zealand.1252". No patch, but I wondered if any Windows hackers have any feedback on relative sanity of trying to fix all these problems this way. [1] https://docs.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-160 [2] https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-getuserdefaultlocalename
Re: Remove redundant variable from transformCreateStmt
On Mon, Apr 19, 2021 at 9:32 AM Amul Sul wrote: > Kindly ignore the previous version -- has unnecessary change. > See the attached. Thanks for the patch! How about a slight rewording of the added comment to "Constraints validation can be skipped for a newly created table as it contains no data. However, this is not necessarily true for a foreign table."? You may want to add it to the commitfest if not done already. And I don't think we need to backpatch this as it's not critical. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: PG Docs - CREATE SUBSCRIPTION option list order
On Mon, Apr 19, 2021 at 10:32 AM Peter Smith wrote: > > On Mon, Apr 19, 2021 at 2:09 PM Amit Kapila wrote: > > > > On Mon, Apr 19, 2021 at 6:32 AM Euler Taveira wrote: > > > > > > On Sun, Apr 18, 2021, at 8:59 PM, Peter Smith wrote: > > > > > > The CREATE SUBSCRIPTION documentation [1] includes a list of "WITH" > > > options, which are currently in some kind of quasi alphabetical / > > > random order which I found unnecessarily confusing. > > > > > > I can't think of any good reason for the current ordering, so PSA my > > > patch which has identical content but just re-orders that option list > > > to be alphabetical. > > > > > > AFAICS there is not reason to use a random order here. I think this > > > parameter > > > list is in frequency of use. Your patch looks good to me. > > > > > > > I also agree that the current order is quite random. One idea is to > > keep them in alphabetical order as suggested by Peter and the other > > could be to arrange parameters based on properties, for example, there > > are few parameters like binary, streaming, copy_data which are in some > > way related to the data being replicated and others are more of slot > > properties (create_slot, slot_name). I see that few parameters among > > these have some dependencies on other parameters as well. I noticed > > that the other DDL commands like Create Table, Create Index doesn't > > have the WITH clause parameters in any alphabetical order so it might > > be better if the related parameters can be together here but if we > > think it is not a good idea in this case due to some reason then > > probably keeping them in alphabetical order makes sense. > > > > Yes, if there were dozens of list items then I would agree that they > should be grouped somehow. But there aren't. > I think this list is going to grow in the future as we enhance this subsystem. For example, the pending 2PC patch will add another parameter to this list. -- With Regards, Amit Kapila.
Re: Replication slot stats misgivings
On Mon, Apr 19, 2021 at 9:00 AM Masahiko Sawada wrote: > > On Fri, Apr 16, 2021 at 2:58 PM Amit Kapila wrote: > > > > > > 4. > > +CREATE VIEW pg_stat_replication_slots AS > > +SELECT > > +s.slot_name, > > +s.spill_txns, > > +s.spill_count, > > +s.spill_bytes, > > +s.stream_txns, > > +s.stream_count, > > +s.stream_bytes, > > +s.total_txns, > > +s.total_bytes, > > +s.stats_reset > > +FROM pg_replication_slots as r, > > +LATERAL pg_stat_get_replication_slot(slot_name) as s > > +WHERE r.datoid IS NOT NULL; -- excluding physical slots > > .. > > .. > > > > -/* Get the statistics for the replication slots */ > > +/* Get the statistics for the replication slot */ > > Datum > > -pg_stat_get_replication_slots(PG_FUNCTION_ARGS) > > +pg_stat_get_replication_slot(PG_FUNCTION_ARGS) > > { > > #define PG_STAT_GET_REPLICATION_SLOT_COLS 10 > > - ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; > > + text *slotname_text = PG_GETARG_TEXT_P(0); > > + NameData slotname; > > > > I think with the above changes getting all the slot stats has become > > much costlier. Is there any reason why can't we get all the stats from > > the new hash_table in one shot and return them to the user? > > I think the advantage of this approach would be that it can avoid > showing the stats for already-dropped slots. Like other statistics > views such as pg_stat_all_tables and pg_stat_all_functions, searching > the stats by the name got from pg_replication_slots can show only > available slot stats even if the hash table has garbage slot stats. > Sounds reasonable. However, if the create_slot message is missed, it will show an empty row for it. See below: postgres=# select slot_name, total_txns from pg_stat_replication_slots; slot_name | total_txns ---+ s1| 0 s2| 0 | (3 rows) Here, I have manually via debugger skipped sending the create_slot message for the third slot and we are showing an empty for it. This won't happen for pg_stat_all_tables, as it will set 0 or other initial values in such a case. I think we need to address this case. > Given that pg_stat_replication_slots doesn’t show garbage slot stats > even if it has, I thought we can avoid checking those garbage stats > frequently. It should not essentially be a problem for the hash table > to have entries up to max_replication_slots regardless of live or > already-dropped. > Yeah, but I guess we still might not save much by not doing it, especially because for the other cases like tables/functions, we are doing it without any threshold limit. -- With Regards, Amit Kapila.
Re: PG Docs - CREATE SUBSCRIPTION option list order
On Mon, Apr 19, 2021 at 2:09 PM Amit Kapila wrote: > > On Mon, Apr 19, 2021 at 6:32 AM Euler Taveira wrote: > > > > On Sun, Apr 18, 2021, at 8:59 PM, Peter Smith wrote: > > > > The CREATE SUBSCRIPTION documentation [1] includes a list of "WITH" > > options, which are currently in some kind of quasi alphabetical / > > random order which I found unnecessarily confusing. > > > > I can't think of any good reason for the current ordering, so PSA my > > patch which has identical content but just re-orders that option list > > to be alphabetical. > > > > AFAICS there is not reason to use a random order here. I think this > > parameter > > list is in frequency of use. Your patch looks good to me. > > > > I also agree that the current order is quite random. One idea is to > keep them in alphabetical order as suggested by Peter and the other > could be to arrange parameters based on properties, for example, there > are few parameters like binary, streaming, copy_data which are in some > way related to the data being replicated and others are more of slot > properties (create_slot, slot_name). I see that few parameters among > these have some dependencies on other parameters as well. I noticed > that the other DDL commands like Create Table, Create Index doesn't > have the WITH clause parameters in any alphabetical order so it might > be better if the related parameters can be together here but if we > think it is not a good idea in this case due to some reason then > probably keeping them in alphabetical order makes sense. > Yes, if there were dozens of list items then I would agree that they should be grouped somehow. But there aren't. I think what may seem like a clever grouping to one reader may look more like an over-complicated muddle to somebody else. So I prefer just to apply the KISS Principle. -- Kind Regards, Peter Smith. Fujitsu Australia.
Re: New Table Access Methods for Multi and Single Inserts
On Mon, Apr 5, 2021 at 9:49 AM Bharath Rupireddy wrote: > > On Wed, Mar 10, 2021 at 10:21 AM Bharath Rupireddy > wrote: > > Attaching the v4 patch set. Please review it further. > > Attaching v5 patch set after rebasing onto the latest master. Another rebase due to conflicts in 0003. Attaching v6 for review. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com From 6518212583e24b017375512701d9fefa6de20e42 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Wed, 10 Mar 2021 09:53:48 +0530 Subject: [PATCH v6 1/3] New Table AMs for Multi and Single Inserts This patch introduces new table access methods for multi and single inserts. Also implements/rearranges the outside code for heap am into these new APIs. Main design goal of these new APIs is to give flexibility to tableam developers in implementing multi insert logic dependent on the underlying storage engine. Currently, for all the underlying storage engines, we follow the same multi insert logic such as when and how to flush the buffered tuples, tuple size calculation, and this logic doesn't take into account the underlying storage engine capabilities. We can also avoid duplicating multi insert code (for existing COPY, and upcoming CTAS, CREATE/REFRESH MAT VIEW and INSERT SELECTs). We can also move bulk insert state allocation and deallocation inside these APIs. --- src/backend/access/heap/heapam.c | 212 +++ src/backend/access/heap/heapam_handler.c | 5 + src/backend/access/table/tableamapi.c| 7 + src/backend/executor/execTuples.c| 83 - src/include/access/heapam.h | 49 +- src/include/access/tableam.h | 87 ++ src/include/executor/tuptable.h | 1 + 7 files changed, 438 insertions(+), 6 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 3b435c107d..d8bfe17f22 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -67,6 +67,7 @@ #include "utils/datum.h" #include "utils/inval.h" #include "utils/lsyscache.h" +#include "utils/memutils.h" #include "utils/relcache.h" #include "utils/snapmgr.h" #include "utils/spccache.h" @@ -2669,6 +2670,217 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples, pgstat_count_heap_insert(relation, ntuples); } +/* + * heap_insert_begin - allocate and initialize TableInsertState + * + * For single inserts: + * 1) Specify is_multi as false, then multi insert state will be NULL. + * + * For multi inserts: + * 1) Specify is_multi as true, then multi insert state will be allocated and + * initialized. + * + * Other input parameters i.e. relation, command id, options are common for + * both single and multi inserts. + */ +TableInsertState* +heap_insert_begin(Relation rel, CommandId cid, int options, bool is_multi) +{ + TableInsertState *state; + + state = palloc(sizeof(TableInsertState)); + state->rel = rel; + state->cid = cid; + state->options = options; + /* Below parameters are not used for single inserts. */ + state->mi_slots = NULL; + state->mistate = NULL; + state->mi_cur_slots = 0; + state->flushed = false; + + if (is_multi) + { + HeapMultiInsertState *mistate; + + mistate = palloc(sizeof(HeapMultiInsertState)); + state->mi_slots = +palloc0(sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES); + mistate->max_slots = MAX_BUFFERED_TUPLES; + mistate->max_size = MAX_BUFFERED_BYTES; + mistate->cur_size = 0; + /* + * Create a temporary memory context so that we can reset once per + * multi insert batch. + */ + mistate->context = AllocSetContextCreate(CurrentMemoryContext, + "heap_multi_insert", + ALLOCSET_DEFAULT_SIZES); + state->mistate = mistate; + } + + return state; +} + +/* + * heap_insert_v2 - insert single tuple into a heap + * + * Insert tuple from slot into table. This is like heap_insert(), the only + * difference is that the parameters for insertion are inside table insert + * state structure. + */ +void +heap_insert_v2(TableInsertState *state, TupleTableSlot *slot) +{ + bool shouldFree = true; + HeapTuple tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree); + + Assert(state); + + /* Update tuple with table oid */ + slot->tts_tableOid = RelationGetRelid(state->rel); + tuple->t_tableOid = slot->tts_tableOid; + + /* Perform insertion, and copy the resulting ItemPointer */ + heap_insert(state->rel, tuple, state->cid, state->options, state->bistate); + ItemPointerCopy(&tuple->t_self, &slot->tts_tid); + + if (shouldFree) + pfree(tuple); +} + +/* + * heap_multi_insert_v2 - insert multiple tuples into a heap + * + * Compute size of tuple. See if the buffered slots can hold the tuple. If yes, + * store it in the buffers, otherwise flush i.e. insert the so far buffered + * tuples into heap. + * + * Flush can happen: + * 1) either if all the buffered slots are filled up + * 2) or if total tuple size of the currently buffered slot
Re: Performance degradation of REFRESH MATERIALIZED VIEW
On Fri, Apr 16, 2021 at 12:16 PM Kyotaro Horiguchi wrote: > > At Mon, 12 Apr 2021 15:20:41 +0900, Masahiko Sawada > wrote in > > . > > > > On Thu, Mar 11, 2021 at 5:44 PM Masahiko Sawada > > wrote: > > > > > > Hi, > > > > > > While discussing freezing tuples during CTAS[1], we found that > > > heap_insert() with HEAP_INSERT_FROZEN brings performance degradation. > > > For instance, with Paul's patch that sets HEAP_INSERT_FROZEN to CTAS, > > > it took 12 sec whereas the code without the patch took 10 sec with the > > > following query: > > > > > > create table t1 (a, b, c, d) as select i,i,i,i from > > > generate_series(1,2000) i; > > > > > > I've done a simple benchmark of REFRESH MATERIALIZED VIEW with the > > > following queries: > > > > > > create table source as select generate_series(1, 5000); > > > create materialized view mv as select * from source; > > > refresh materialized view mv; > > > > > > The execution time of REFRESH MATERIALIZED VIEW are: > > > > > > w/ HEAP_INSERT_FROZEN flag : 42 sec > > > w/o HEAP_INSERT_FROZEN flag : 33 sec > > > > > > After investigation, I found that such performance degradation happens > > > on only HEAD code. It seems to me that commit 39b66a91b (and > > > 7db0cd2145) is relevant that has heap_insert() set VM bits and > > > PD_ALL_VISIBLE if HEAP_INSERT_FROZEN is specified (so CCing Tomas > > > Vondra and authors). Since heap_insert() sets PD_ALL_VISIBLE to the > > > page when inserting a tuple for the first time on the page (around > > > L2133 in heapam.c), every subsequent heap_insert() on the page reads > > > and pins a VM buffer (see RelationGetBufferForTuple()). > > > > IIUC RelationGetBufferForTuple() pins vm buffer if the page is > > all-visible since the caller might clear vm bit during operation. But > > it's not necessarily true in HEAP_FROZEN_INSERT case. When inserting > > HEAP_FROZEN_INSERT, we might set PD_ALL_VISIBLE flag and all-visible > > bit but never clear those flag and bit during insertion. Therefore to > > fix this issue, I think we can have RelationGetBufferForTuple() not to > > pin vm buffer if we're inserting a frozen tuple (i.g., > > HEAP_FROZEN_INSERT case) and the target page is already all-visible. > > It seems right to me. > > > In HEAP_FROZEN_INSERT, the cases where we need to pin vm buffer would > > be the table is empty. That way, we will pin vm buffer only for the > > first time of inserting frozen tuple into the empty page, then set > > PD_ALL_VISIBLE to the page and all-frozen bit on vm. Also set > > XLH_INSERT_ALL_FROZEN_SET to WAL. At further insertions, we would not > > pin vm buffer as long as we’re inserting a frozen tuple into the same > > page. > > > > If the target page is neither empty nor all-visible we will not pin vm > > buffer, which is fine because if the page has non-frozen tuple we > > cannot set bit on vm during heap_insert(). If all tuples on the page > > are already frozen but PD_ALL_VISIBLE is not set for some reason, we > > would be able to set all-frozen bit on vm but it seems not a good idea > > since it requires checking during insertion if all existing tuples are > > frozen or not. > > > > The attached patch does the above idea. With this patch, the same > > performance tests took 33 sec. Thank you for the comments. > > Great! The direction of the patch looks fine to me. > > +* If we're inserting frozen entry into empty page, we will > set > +* all-visible to page and all-frozen on visibility map. > +*/ > + if (PageGetMaxOffsetNumber(page) == 0) > all_frozen_set = true; > > AFAICS the page is always empty when RelationGetBufferForTuple > returned a valid vmbuffer. So the "if" should be an "assert" instead. There is a chance that RelationGetBufferForTuple() returns a valid vmbuffer but the page is not empty, since RelationGetBufferForTuple() checks without a lock if the page is empty. But when it comes to HEAP_INSERT_FROZEN cases it actually doesn’t happen at least for now since only one process inserts tuples into the relation. Will fix. > > And, the patch changes the value of all_frozen_set to false when the > page was already all-frozen (thus not empty). It would be fine since > we don't need to change the visibility of the page in that case but > the variable name is no longer correct. set_all_visible or such? It seems to me that the variable name all_frozen_set corresponds to XLH_INSERT_ALL_FROZEN_SET but I see your point. How about set_all_frozen instead since we set all-frozen bits (also implying setting all-visible)? BTW I found the following description of XLH_INSERT_ALL_FROZEN_SET but there is no all_visible_set anywhere: /* all_frozen_set always implies all_visible_set */ #define XLH_INSERT_ALL_FROZEN_SET (1<<5) I'll update those comments as well. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Replication slot stats misgivings
On Fri, Apr 16, 2021 at 8:50 AM Justin Pryzby wrote: > > On Fri, Apr 16, 2021 at 08:48:29AM +0530, Amit Kapila wrote: > > I am fine with your proposed changes. There are one or two more > > patches in this area. I can include your suggestions along with those > > if you don't mind? > > However's convenient is fine > Thanks for your suggestions. I have pushed your changes as part of the commit c64dcc7fee. -- With Regards, Amit Kapila.
Re: Replication slot stats misgivings
On Sun, Apr 18, 2021 at 6:51 PM Masahiko Sawada wrote: > > Yes, also the following expectation in expected/stats.out is wrong: > > SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS > spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS > total_bytes FROM pg_stat_replication_slots; > slot_name| spill_txns | spill_count | total_txns | total_bytes > -++-++- > regression_slot | f | f | t | t > (1 row) > > We should expect all values are 0. Please find attached the patch. > Right. Both your and Vignesh's patch will fix the problem but I mildly prefer Vignesh's one as that seems a bit simpler. So, I went ahead and pushed his patch with minor other changes. Thanks to both of you. -- With Regards, Amit Kapila.
Re: PG Docs - CREATE SUBSCRIPTION option list order
On Mon, Apr 19, 2021 at 6:32 AM Euler Taveira wrote: > > On Sun, Apr 18, 2021, at 8:59 PM, Peter Smith wrote: > > The CREATE SUBSCRIPTION documentation [1] includes a list of "WITH" > options, which are currently in some kind of quasi alphabetical / > random order which I found unnecessarily confusing. > > I can't think of any good reason for the current ordering, so PSA my > patch which has identical content but just re-orders that option list > to be alphabetical. > > AFAICS there is not reason to use a random order here. I think this parameter > list is in frequency of use. Your patch looks good to me. > I also agree that the current order is quite random. One idea is to keep them in alphabetical order as suggested by Peter and the other could be to arrange parameters based on properties, for example, there are few parameters like binary, streaming, copy_data which are in some way related to the data being replicated and others are more of slot properties (create_slot, slot_name). I see that few parameters among these have some dependencies on other parameters as well. I noticed that the other DDL commands like Create Table, Create Index doesn't have the WITH clause parameters in any alphabetical order so it might be better if the related parameters can be together here but if we think it is not a good idea in this case due to some reason then probably keeping them in alphabetical order makes sense. -- With Regards, Amit Kapila.
Re: Remove redundant variable from transformCreateStmt
On Mon, Apr 19, 2021 at 9:28 AM Amul Sul wrote: > > On Fri, Apr 16, 2021 at 6:26 AM Bharath Rupireddy > wrote: > > > > On Thu, Apr 15, 2021 at 8:40 PM Jeevan Ladhe > > wrote: > > > IMHO, I think the idea here was to just get rid of an unnecessary variable > > > rather than refactoring. > > > > > > On Thu, Apr 15, 2021 at 5:48 PM Bharath Rupireddy > > > wrote: > > >> > > >> On Thu, Apr 15, 2021 at 5:04 PM Amul Sul wrote: > > >> > > > >> > Hi, > > >> > > > >> > Attached patch removes "is_foreign_table" from transformCreateStmt() > > >> > since it already has cxt.isforeign that can serve the same purpose. > > >> > > >> Yeah having that variable as "is_foreign_table" doesn't make sense > > >> when we have the info in ctx. I'm wondering whether we can do the > > >> following (like transformFKConstraints). It will be more readable and > > >> we could also add more comments on why we don't skip validation for > > >> check constraints i.e. constraint->skip_validation = false in case for > > >> foreign tables. > > > > > > To address your concern here, I think it can be addressed by adding a > > > comment > > > just before we make a call to transformCheckConstraints(). > > > > +1. > > Ok, added the comment in the attached version. Kindly ignore the previous version -- has unnecessary change. See the attached. Regards, Amul diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c index 9dd30370dae..9aaa4bde278 100644 --- a/src/backend/parser/parse_utilcmd.c +++ b/src/backend/parser/parse_utilcmd.c @@ -176,7 +176,6 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString) Oid namespaceid; Oid existing_relid; ParseCallbackState pcbstate; - bool is_foreign_table = IsA(stmt, CreateForeignTableStmt); /* * We must not scribble on the passed-in CreateStmt, so copy it. (This is @@ -333,8 +332,12 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString) /* * Postprocess check constraints. + * + * For a table, the constraint can be considered validated immediately, + * because the table must be empty. But for a foreign table this is not + * necessarily the case. */ - transformCheckConstraints(&cxt, !is_foreign_table ? true : false); + transformCheckConstraints(&cxt, !cxt.isforeign); /* * Postprocess extended statistics.
Re: Remove redundant variable from transformCreateStmt
On Fri, Apr 16, 2021 at 6:26 AM Bharath Rupireddy wrote: > > On Thu, Apr 15, 2021 at 8:40 PM Jeevan Ladhe > wrote: > > IMHO, I think the idea here was to just get rid of an unnecessary variable > > rather than refactoring. > > > > On Thu, Apr 15, 2021 at 5:48 PM Bharath Rupireddy > > wrote: > >> > >> On Thu, Apr 15, 2021 at 5:04 PM Amul Sul wrote: > >> > > >> > Hi, > >> > > >> > Attached patch removes "is_foreign_table" from transformCreateStmt() > >> > since it already has cxt.isforeign that can serve the same purpose. > >> > >> Yeah having that variable as "is_foreign_table" doesn't make sense > >> when we have the info in ctx. I'm wondering whether we can do the > >> following (like transformFKConstraints). It will be more readable and > >> we could also add more comments on why we don't skip validation for > >> check constraints i.e. constraint->skip_validation = false in case for > >> foreign tables. > > > > To address your concern here, I think it can be addressed by adding a > > comment > > just before we make a call to transformCheckConstraints(). > > +1. Ok, added the comment in the attached version. Thanks Jeevan & Bharat for the review. Regards, Amul diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c index 9dd30370dae..17182500c61 100644 --- a/src/backend/parser/parse_utilcmd.c +++ b/src/backend/parser/parse_utilcmd.c @@ -176,7 +176,6 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString) Oid namespaceid; Oid existing_relid; ParseCallbackState pcbstate; - bool is_foreign_table = IsA(stmt, CreateForeignTableStmt); /* * We must not scribble on the passed-in CreateStmt, so copy it. (This is @@ -327,14 +326,18 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString) cxt.alist = list_concat(cxt.alist, cxt.likeclauses); /* - * Postprocess foreign-key constraints. + * Postprocess foreign-key and check constraints. */ transformFKConstraints(&cxt, true, false); /* * Postprocess check constraints. + * + * For a table, the constraint can be considered validated immediately, + * because the table must be empty. But for a foreign table this is not + * necessarily the case. */ - transformCheckConstraints(&cxt, !is_foreign_table ? true : false); + transformCheckConstraints(&cxt, !cxt.isforeign); /* * Postprocess extended statistics.
Re: Replication slot stats misgivings
On Fri, Apr 16, 2021 at 2:58 PM Amit Kapila wrote: > > On Thu, Apr 15, 2021 at 4:35 PM Masahiko Sawada wrote: > > > > Thank you for the update! The patch looks good to me. > > > > I have pushed the first patch. Comments on the next patch > v13-0001-Use-HTAB-for-replication-slot-statistics: > 1. > + /* > + * Check for all replication slots in stats hash table. We do this check > + * when replSlotStats has more than max_replication_slots entries, i.e, > + * when there are stats for the already-dropped slot, to avoid frequent > + * call SearchNamedReplicationSlot() which acquires LWLock. > + */ > + if (replSlotStats && hash_get_num_entries(replSlotStats) > > max_replication_slots) > + { > + PgStat_ReplSlotEntry *slotentry; > + > + hash_seq_init(&hstat, replSlotStats); > + while ((slotentry = (PgStat_ReplSlotEntry *) hash_seq_search(&hstat)) != > NULL) > + { > + if (SearchNamedReplicationSlot(NameStr(slotentry->slotname), true) == NULL) > + pgstat_report_replslot_drop(NameStr(slotentry->slotname)); > + } > + } > > Is SearchNamedReplicationSlot() so frequently used that we need to do > this only when the hash table has entries more than > max_replication_slots? I think it would be better if we can do it > without such a condition to reduce the chances of missing the slot > stats. We don't have any such restrictions for any other cases in this > function. Please see below comment on #4. > > I think it is better to add CHECK_FOR_INTERRUPTS in the above while loop? Agreed. > > 2. > /* > * Replication slot statistics kept in the stats collector > */ > -typedef struct PgStat_ReplSlotStats > +typedef struct PgStat_ReplSlotEntry > > I think the comment above this structure can be changed to "The > collector's data per slot" or something like that. Also, if we have to > follow table/function/db style, then probably this structure should be > named as PgStat_StatReplSlotEntry. Agreed. > > 3. > - * create the statistics for the replication slot. > + * create the statistics for the replication slot. In case where the > + * message for dropping the old slot gets lost and a slot with the same is > > /the same is/the same name is/. > > Can we mention something similar to what you have added here in docs as well? Agreed. > > 4. > +CREATE VIEW pg_stat_replication_slots AS > +SELECT > +s.slot_name, > +s.spill_txns, > +s.spill_count, > +s.spill_bytes, > +s.stream_txns, > +s.stream_count, > +s.stream_bytes, > +s.total_txns, > +s.total_bytes, > +s.stats_reset > +FROM pg_replication_slots as r, > +LATERAL pg_stat_get_replication_slot(slot_name) as s > +WHERE r.datoid IS NOT NULL; -- excluding physical slots > .. > .. > > -/* Get the statistics for the replication slots */ > +/* Get the statistics for the replication slot */ > Datum > -pg_stat_get_replication_slots(PG_FUNCTION_ARGS) > +pg_stat_get_replication_slot(PG_FUNCTION_ARGS) > { > #define PG_STAT_GET_REPLICATION_SLOT_COLS 10 > - ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; > + text *slotname_text = PG_GETARG_TEXT_P(0); > + NameData slotname; > > I think with the above changes getting all the slot stats has become > much costlier. Is there any reason why can't we get all the stats from > the new hash_table in one shot and return them to the user? I think the advantage of this approach would be that it can avoid showing the stats for already-dropped slots. Like other statistics views such as pg_stat_all_tables and pg_stat_all_functions, searching the stats by the name got from pg_replication_slots can show only available slot stats even if the hash table has garbage slot stats. Given that pg_stat_replication_slots doesn’t show garbage slot stats even if it has, I thought we can avoid checking those garbage stats frequently. It should not essentially be a problem for the hash table to have entries up to max_replication_slots regardless of live or already-dropped. As another design, we can get all stats from the hash table in one shot as you suggested. If we do that, it's better to check garbage slot stats every time pgstat_vacuum_stat() is called so the view doesn't show those stats but cannot avoid that completely. I'll change the code pointed out by #1 and #4 according to this design discussion. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: doc review for v14
On Fri, Apr 16, 2021 at 02:03:10AM -0500, Justin Pryzby wrote: > A bunch more found with things like this. Thanks, applied most of it! -- Michael signature.asc Description: PGP signature
Re: File truncation within PostgresNode::issues_sql_like() wrong on Windows
On Sat, Apr 17, 2021 at 09:55:47AM -0400, Andrew Dunstan wrote: > Yes please, much better to use a symbolic name rather than a magic > number. I wouldn't bother backpatching it though. Okay, done this way then. -- Michael signature.asc Description: PGP signature
Re: PG Docs - CREATE SUBSCRIPTION option list order
On Sun, Apr 18, 2021, at 8:59 PM, Peter Smith wrote: > The CREATE SUBSCRIPTION documentation [1] includes a list of "WITH" > options, which are currently in some kind of quasi alphabetical / > random order which I found unnecessarily confusing. > > I can't think of any good reason for the current ordering, so PSA my > patch which has identical content but just re-orders that option list > to be alphabetical. AFAICS there is not reason to use a random order here. I think this parameter list is in frequency of use. Your patch looks good to me. -- Euler Taveira EDB https://www.enterprisedb.com/
PG Docs - CREATE SUBSCRIPTION option list order
Hi, The CREATE SUBSCRIPTION documentation [1] includes a list of "WITH" options, which are currently in some kind of quasi alphabetical / random order which I found unnecessarily confusing. I can't think of any good reason for the current ordering, so PSA my patch which has identical content but just re-orders that option list to be alphabetical. -- [1] = https://www.postgresql.org/docs/devel/sql-createsubscription.html Kind Regards, Peter Smith. Fujitsu Australia v1-0001-create-subscription-options-list-order.patch Description: Binary data
Re: track_planning causing performance regression
Reviewing this change which was committed last year as 321fa6a4a26c9b649a0fbec9fc8b019f19e62289 On Fri, Jul 03, 2020 at 03:57:38PM +0900, Fujii Masao wrote: > On 2020/07/03 13:05, Pavel Stehule wrote: > > pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao > > napsal: > > > > Maybe there can be documented so enabling this option can have a negative > > impact on performance. > > Yes. What about adding either of the followings into the doc? > > Enabling this parameter may incur a noticeable performance penalty. > > or > > Enabling this parameter may incur a noticeable performance penalty, > especially when a fewer kinds of queries are executed on many > concurrent connections. Something seems is wrong with this sentence, and I'm not sure what it's trying to say. Is this right ? > Enabling this parameter may incur a noticeable performance penalty, > especially when a small number of queries are executed on many > concurrent connections. -- Justin
Re: pg_amcheck option to install extension
On 2021-Apr-18, Andrew Dunstan wrote: > On 4/17/21 3:43 PM, Mark Dilger wrote: > > I'd also like your impressions on whether we're likely to move > > contrib/amcheck into core anytime soon. If so, is it worth adding > > an option that we'll soon need to deprecate? > > I think if it stays as an extension it will stay in contrib. But it sure > feels very odd to have a core bin program that relies on a contrib > extension. It seems one or the other is misplaced. I've proposed in the past that we should have a way to provide extensions other than contrib -- specifically src/extensions/ -- and then have those extensions installed together with the rest of core. Then it would be perfectly legitimate to have src/bin/pg_amcheck that depending that extension. I agree that the current situation is not great. -- Álvaro Herrera39°49'30"S 73°17'W "Thou shalt not follow the NULL pointer, for chaos and madness await thee at its end." (2nd Commandment for C programmers)
Re: partial heap only tuples
On Tue, Feb 9, 2021 at 10:48 AM Bossart, Nathan wrote: > I'm hoping to gather some early feedback on a heap optimization I've > been working on. In short, I'm hoping to add "partial heap only > tuple" (PHOT) support, which would allow you to skip updating indexes > for unchanged columns even when other indexes require updates. Today, > HOT works wonders when no indexed columns are updated. However, as > soon as you touch one indexed column, you lose that optimization > entirely, as you must update every index on the table. The resulting > performance impact is a pain point for many of our (AWS's) enterprise > customers, so we'd like to lend a hand for some improvements in this > area. For workloads involving a lot of columns and a lot of indexes, > an optimization like PHOT can make a huge difference. I'm aware that > there was a previous attempt a few years ago to add a similar > optimization called WARM [0] [1]. However, I only noticed this > previous effort after coming up with the design for PHOT, so I ended > up taking a slightly different approach. I am also aware of a couple > of recent nbtree improvements that may mitigate some of the impact of > non-HOT updates [2] [3], but I am hoping that PHOT serves as a nice > complement to those. I've attached a very early proof-of-concept > patch with the design described below. I would like to share some thoughts that I have about how I think about optimizations like PHOT, and how they might fit together with my own work -- particularly the nbtree bottom-up index deletion feature you referenced. My remarks could equally well apply to WARM. Ordinarily this is the kind of thing that would be too hand-wavey for the mailing list, but we don't have the luxury of in-person communication right now. Everybody tends to talk about HOT as if it works perfectly once you make some modest assumptions, such as "there are no long-running transactions", and "no UPDATEs will logically modify indexed columns". But I tend to doubt that that's truly the case -- I think that there are still pathological cases where HOT cannot keep the total table size stable in the long run due to subtle effects that eventually aggregate into significant issues, like heap fragmentation. Ask Jan Wieck about the stability of some of the TPC-C/BenchmarkSQL tables to get one example of this. There is no reason to believe that PHOT will help with that. Maybe that's okay, but I would think carefully about what that means if I were undertaking this work. Ensuring stability in the on-disk size of tables in cases where the size of the logical database is stable should be an important goal of a project like PHOT or HOT. If you want to get a better sense of how these inefficiencies might happen, I suggest looking into using recently added autovacuum logging to analyze how well HOT works today, using the technique that I go into here: https://postgr.es/m/cah2-wzkju+nibskzunbdpz6trse+aqvupae+xgm8zvob4wq...@mail.gmail.com Small inefficiencies in the on-disk structure have a tendency to aggregate over time, at least when there is no possible way to reverse them. The bottom-up index deletion stuff is very effective as a backstop against index bloat, because things are generally very non-linear. The cost of an unnecessary page split is very high, and permanent. But we can make it cheap to *try* to avoid that using fairly simple heuristics. We can be reasonably confident that we're about to split the page unnecessarily, and use cues that ramp up the number of heap page accesses as needed. We ramp up during a bottom-up index deletion, as we manage to free some index tuples as a result of previous heap page accesses. This works very well because we can intervene very selectively. We aren't interested in deleting index tuples unless and until we really have to, and in general there tends to be quite a bit of free space to temporarily store extra version duplicates -- that's what most index pages look like, even on the busiest of databases. It's possible for the bottom-up index deletion mechanism to be invoked very infrequently, and yet make a huge difference. And when it fails to free anything, it fails permanently for that particular leaf page (because it splits) -- so now we have lots of space for future index tuple insertions that cover the original page's key space. We won't thrash. My intuition is that similar principles can be applied inside heapam. Failing to fit related versions on a heap page (having managed to do so for hours or days before that point) is more or less the heap page equivalent of a leaf page split from version churn (this is the pathology that bottom-up index deletion targets). For example, we could have a fall back mode that compresses old versions that is used if and only if heap pruning was attempted but then failed. We should always try to avoid migrating to a new heap page, because that amounts to a permanent solution to a temporary problem. We should perhaps make the
Re: pg_amcheck option to install extension
> On Apr 18, 2021, at 6:19 AM, Andrew Dunstan wrote: > > how about specifying pg_catalog as the schema instead of public? Done. v2-0001-Adding-install-missing-option-to-pg_amcheck.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: SQL-standard function body
... BTW, a dependency loop is also possible without using this feature, by abusing default-value expressions: create function f1(x int, y int) returns int language sql as 'select $1 + $2'; create function f2(x int, y int default f1(1,2)) returns int language sql as 'select $1 + $2'; create or replace function f1(x int, y int default f2(11,12)) returns int language sql as 'select $1 + $2'; The actual use-case for that seems pretty thin, so we never bothered to worry about it before. But if we're going to build loop-breaking logic to handle function body dependencies, it should deal with this too. I think that all that's required is for the initial dummy function declaration to omit defaults as well as providing a dummy body. regards, tom lane
Re: SQL-standard function body
On Sun, Apr 18, 2021 at 03:08:44PM -0400, Tom Lane wrote: > Noah Misch writes: > > Should we be okay releasing v14 without support for breaking function > > dependency loops, or does that call for an open item? > > Oh! That should definitely be an open item. It doesn't seem > that hard to do something similar to what we do for views, > i.e. create a dummy function and replace it later. I added https://wiki.postgresql.org/index.php?title=PostgreSQL_14_Open_Items&type=revision&diff=35926&oldid=35925
Re: Planning time grows exponentially with levels of nested views
On Sun, Apr 18, 2021, at 22:14, Tom Lane wrote: > "Joel Jacobson" mailto:joel%40compiler.org>> writes: > > I assumed the cost for each nested VIEW layer would grow linear, > > but my testing shows it appears to grow exponentially: > > I think it's impossible to avoid less-than-O(N^2) growth on this sort > of case. For example, the v2 subquery initially has RTEs for v2 itself > plus v1. When we flatten v1 into v2, v2 acquires the RTEs from v1, > namely v1 itself plus foo. Similarly, once vK-1 is pulled up into vK, > there are going to be order-of-K entries in vK's rtable, and that stacking > makes for O(N^2) work overall just in manipulating the rtable. > > We can't get rid of these rtable entries altogether, since all of them > represent table privilege checks that the executor will need to do. > It occurs to me though that we don't need the rte->subquery trees anymore > once those are flattened, so maybe we could do something like the > attached. For me, this reduces the slowdown in your example from > O(N^3) to O(N^2). Many thanks for explaining and the patch. I've tested the patch successfully. Planning time grows much slower now: EXPLAIN ANALYZE SELECT * FROM v64; - Planning Time: 14.981 ms + Planning Time: 2.802 ms EXPLAIN ANALYZE SELECT * FROM v128; - Planning Time: 109.407 ms + Planning Time: 11.595 ms EXPLAIN ANALYZE SELECT * FROM v256; - Planning Time: 1594.809 ms + Planning Time: 46.709 ms Very nice. /Joel
Re: Planning time grows exponentially with levels of nested views
I wrote: > If multiple references are actually possible then this'd break it. There > seem to be no such cases in the regression tests though, and I'm having a > hard time wrapping my brain around what would cause it. "git blame" > traces this text to my own commit f44639e1b, which has the log entry > Don't crash if subquery appears multiple times in jointree. This should > not happen anyway, but let's try not to get completely confused if it does > (due to rewriter bugs or whatever). > so I'm thinking that this was only hypothetical. Ah, found it. That was evidently a reaction to the immediately preceding commit (352871ac9), which fixed a rewriter bug that could lead to exactly the case of multiple jointree references to one RTE. I think this patch doesn't make things any worse for such a case though. If we re-introduced such a bug, the result would be an immediate null pointer crash while trying to process the second reference to a flattenable subquery. That's probably better for debuggability than what happens now, where we just silently process the duplicate reference. Anyway, I've stuck this into the next CF for future consideration. regards, tom lane PS: to save time for anyone else who wants to investigate this, it looks like the report mentioned in 352871ac9 was https://www.postgresql.org/message-id/007401c0860d%24bed809a0%241001a8c0%40archonet.com
Re: Old Postgresql version on i7-1165g7
Tom Lane писал 2021-04-13 17:45: Justin Pryzby writes: On Fri, Apr 09, 2021 at 04:28:25PM +0300, Yura Sokolov wrote: Occasinally I found I'm not able to `make check` old Postgresql versions. I've bisected between REL_11_0 and "Rename pg_rewind's copy_file_range()" and found 372728b0d49552641f0ea83d9d2e08817de038fa Replace our traditional initial-catalog-data format with a better design. This is first commit where `make check` doesn't fail during initdb on my machine. This doesn't make much sense or help much, since 372728b doesn't actually change the catalogs, or any .c file. It could make sense if some part of the toolchain that was previously used to generate postgres.bki doesn't work right on that machine. Overall though I'd have thought that 372728b would increase not decrease our toolchain footprint. It also seems unlikely that a recent Ubuntu release would contain toolchain bugs that we hadn't already heard about. You used make clean too, right ? Really, when bisecting, you need to use "make distclean" or even "git clean -dfx" between steps, or you may get bogus results, because our makefiles aren't that great about tracking dependencies, especially when you move backwards in the history. So perhaps a more plausible theory is that this bisection result is wrong because you weren't careful enough. regards, tom lane Sorry for missing mail for a week. I believe I cleaned before each step since I'm building in external directory and cleanup is just `rm * -r`. But I'll repeat bisecting tomorrow to be sure. I don't think it is really PostgreSQL or toolchain bug. I believe it is some corner case that were changed in new Intel CPU. With regards, Yura Sokolov.
Re: Planning time grows exponentially with levels of nested views
[ redirecting to -hackers so the cfbot can see it ] "Joel Jacobson" writes: > I assumed the cost for each nested VIEW layer would grow linear, > but my testing shows it appears to grow exponentially: I think it's impossible to avoid less-than-O(N^2) growth on this sort of case. For example, the v2 subquery initially has RTEs for v2 itself plus v1. When we flatten v1 into v2, v2 acquires the RTEs from v1, namely v1 itself plus foo. Similarly, once vK-1 is pulled up into vK, there are going to be order-of-K entries in vK's rtable, and that stacking makes for O(N^2) work overall just in manipulating the rtable. We can't get rid of these rtable entries altogether, since all of them represent table privilege checks that the executor will need to do. It occurs to me though that we don't need the rte->subquery trees anymore once those are flattened, so maybe we could do something like the attached. For me, this reduces the slowdown in your example from O(N^3) to O(N^2). I'm slightly worried though by this comment earlier in pull_up_simple_subquery: /* * Need a modifiable copy of the subquery to hack on. Even if we didn't * sometimes choose not to pull up below, we must do this to avoid * problems if the same subquery is referenced from multiple jointree * items (which can't happen normally, but might after rule rewriting). */ If multiple references are actually possible then this'd break it. There seem to be no such cases in the regression tests though, and I'm having a hard time wrapping my brain around what would cause it. "git blame" traces this text to my own commit f44639e1b, which has the log entry Don't crash if subquery appears multiple times in jointree. This should not happen anyway, but let's try not to get completely confused if it does (due to rewriter bugs or whatever). so I'm thinking that this was only hypothetical. It's possible that we could do something similar in the sibling functions pull_up_simple_union_all etc, but I'm not sure it's worth troubling over. TBH, for the size of effect you're showing here, I wouldn't be worried at all; it's only because it seems to be a one-liner to improve it that I'm interested in doing something. regards, tom lane diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index 62a1668796..f93c037778 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -1174,6 +1174,14 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte, Assert(root->placeholder_list == NIL); Assert(subroot->placeholder_list == NIL); + /* + * We no longer need the RTE's copy of the subquery's query tree. Getting + * rid of it saves nothing in particular so far as this level of query is + * concerned; but if this query level is in turn pulled up into a parent, + * we'd waste cycles copying the now-unused query tree. + */ + rte->subquery = NULL; + /* * Miscellaneous housekeeping. *
Re: More info on pg_stat_activity Wait Event Name when is DataFileRead
On Sat, Apr 17, 2021 at 1:58 PM PegoraroF10 wrote: > This long explaining was only to show, at least for me, that would be > desirable to have an additional information when Postgres is waiting for a > file. What if DataFileRead showing relfilenode it´s waiting for ? I agree that this would be nice, but it's pretty much impossible to do it without adding quite a bit more overhead than the current system has. And it already has enough overhead to make Andres at least slightly grumpy, though admittedly a lot of things have enough overhead to make Andres grumpy, because he REALLY likes it when things go fast. :-) I suspect it's best to investigate problems like the one you're having using a tool like strace, which can provide way more detail than we could ever cram into a wait event, like the data actually read or written, timestamps for every operation, etc. But I also kind of wonder whether it really matters. If your system is getting stuck in a DataFileRead event for a long period of time, and assuming that's for real and not just some kind of reporting bug, it sounds a lot like you have a bad disk or a severely overloaded I/O subsystem. Because what else would lead to the system getting stuck inside read() for a long time? -- Robert Haas EDB: http://www.enterprisedb.com
Re: SQL-standard function body
Noah Misch writes: > Should we be okay releasing v14 without support for breaking function > dependency loops, or does that call for an open item? Oh! That should definitely be an open item. It doesn't seem that hard to do something similar to what we do for views, i.e. create a dummy function and replace it later. regards, tom lane
Re: SQL-standard function body
On Tue, Jun 30, 2020 at 02:51:38PM -0400, Tom Lane wrote: > The point remains that exposing the function body's dependencies will > constrain restore order far more than we are accustomed to see. It > might be possible to build examples that flat out can't be restored, > even granting that we teach pg_dump how to break dependency loops > by first creating the function with empty body and later redefining > it with the real body. (Admittedly, if that's possible then you > likely could make it happen with views too. But somehow it seems > more likely that people would create spaghetti dependencies for > functions than views.) Should we be okay releasing v14 without support for breaking function dependency loops, or does that call for an open item? -- example create function f() returns int language sql return 1; create function g() returns int language sql return f(); create or replace function f() returns int language sql return coalesce(2, g()); -- but when a view can break the cycle, pg_dump still does so create view v as select 1 as c; create function f() returns int language sql return coalesce(0, (select count(*) from v)); create or replace view v as select f() as c;
Re: 回复: Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10));
Yulin PEI writes: > After several tests, I found that this patch do not fix the bug well. What do you think is wrong with it? > So the attachment is my patch and it works well as far as I tested. This seems equivalent to the already-committed patch [1] except that it wastes a makeNode call in the coerce-to-uncollatable-type case. regards, tom lane [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c402b02b9fb53aee2a26876de90a8f95f9a9be92
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96
I wrote: > I think it's time for some refactoring of this code so that we can > actually share the logic. Accordingly, I propose the attached. After sleeping on it, here's an improved version that gets rid of an unnecessary assumption about ECs usually not containing both parallel-safe and parallel-unsafe members. I'd tried to do this yesterday but didn't like the amount of side-effects on createplan.c (caused by the direct call sites not being passed the "root" pointer). However, we can avoid refactoring createplan.c APIs by saying that it's okay to pass root = NULL to find_computable_ec_member if you're not asking it to check parallel safety. And there's not really a need to put a parallel-safety check into find_ec_member_matching_expr at all; that task can be left with the one caller that cares. regards, tom lane diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index 0188c1e9a1..6627491519 100644 --- a/src/backend/optimizer/path/equivclass.c +++ b/src/backend/optimizer/path/equivclass.c @@ -35,6 +35,7 @@ static EquivalenceMember *add_eq_member(EquivalenceClass *ec, Expr *expr, Relids relids, Relids nullable_relids, bool is_child, Oid datatype); +static bool exprlist_member_ignore_relabel(Expr *node, List *exprs); static void generate_base_implied_equalities_const(PlannerInfo *root, EquivalenceClass *ec); static void generate_base_implied_equalities_no_const(PlannerInfo *root, @@ -769,6 +770,169 @@ get_eclass_for_sort_expr(PlannerInfo *root, return newec; } +/* + * find_ec_member_matching_expr + * Locate an EquivalenceClass member matching the given expr, if any; + * return NULL if no match. + * + * "Matching" is defined as "equal after stripping RelabelTypes". + * This is used for identifying sort expressions, and we need to allow + * binary-compatible relabeling for some cases involving binary-compatible + * sort operators. + * + * Child EC members are ignored unless they belong to given 'relids'. + */ +EquivalenceMember * +find_ec_member_matching_expr(EquivalenceClass *ec, + Expr *expr, + Relids relids) +{ + ListCell *lc; + + /* We ignore binary-compatible relabeling on both ends */ + while (expr && IsA(expr, RelabelType)) + expr = ((RelabelType *) expr)->arg; + + foreach(lc, ec->ec_members) + { + EquivalenceMember *em = (EquivalenceMember *) lfirst(lc); + Expr *emexpr; + + /* + * We shouldn't be trying to sort by an equivalence class that + * contains a constant, so no need to consider such cases any further. + */ + if (em->em_is_const) + continue; + + /* + * Ignore child members unless they belong to the requested rel. + */ + if (em->em_is_child && + !bms_is_subset(em->em_relids, relids)) + continue; + + /* + * Match if same expression (after stripping relabel). + */ + emexpr = em->em_expr; + while (emexpr && IsA(emexpr, RelabelType)) + emexpr = ((RelabelType *) emexpr)->arg; + + if (equal(emexpr, expr)) + return em; + } + + return NULL; +} + +/* + * find_computable_ec_member + * Locate an EquivalenceClass member that can be computed from the + * expressions appearing in "exprs"; return NULL if no match. + * + * "exprs" can be either a list of bare expression trees, or a list of + * TargetEntry nodes. Either way, it should contain Vars and possibly + * Aggrefs and WindowFuncs, which are matched to the corresponding elements + * of the EquivalenceClass's expressions. As in find_ec_member_matching_expr, + * we ignore binary-compatible relabeling. + * + * Child EC members are ignored unless they belong to given 'relids'. + * Also, non-parallel-safe expressions are ignored if 'require_parallel_safe'. + * + * Note: some callers pass root == NULL for notational reasons. This is OK + * when require_parallel_safe is false. + */ +EquivalenceMember * +find_computable_ec_member(PlannerInfo *root, + EquivalenceClass *ec, + List *exprs, + Relids relids, + bool require_parallel_safe) +{ + ListCell *lc; + + foreach(lc, ec->ec_members) + { + EquivalenceMember *em = (EquivalenceMember *) lfirst(lc); + List *exprvars; + ListCell *lc2; + + /* + * We shouldn't be trying to sort by an equivalence class that + * contains a constant, so no need to consider such cases any further. + */ + if (em->em_is_const) + continue; + + /* + * Ignore child members unless they belong to the requested rel. + */ + if (em->em_is_child && + !bms_is_subset(em->em_relids, relids)) + continue; + + /* + * Match if all Vars and quasi-Vars are available in "exprs". + */ + exprvars = pull_var_clause((Node *) em->em_expr, + PVC_INCLUDE_AGGREGATES | + PVC_INCLUDE_WINDOWFUNCS | + PVC_INCLUDE_PLACEHOLDERS); + foreach(lc2, exprvars) + { + if (!exprlist_member_ignore_relabel(lfirst(lc2), exprs)) +break; + } + list_free(exprvars); + if (lc2) + continue; /* we hit a no
Re: 2 questions about volatile attribute of pg_proc.
On Sun, Apr 18, 2021 at 9:08 AM Tom Lane wrote: > Isaac Morland writes: > > On Sun, 18 Apr 2021 at 11:36, Tom Lane wrote: > >> Are you familiar with the halting problem? I don't see any meaningful > >> difference here. > > > I think what is being suggested is akin to type checking, not solving the > > halting problem. > > Yeah, on further thought we'd be satisfied with a conservative > approximation, so that removes the theoretical-impossibility objection. > Still, there are a lot of remaining problems, as you note. > > Yeah, the type checking approach seems blocked by the "black box" nature of functions. A possibly more promising approach is for the top-level call to declare its expectations (which are set by the user) and during execution if that expectation is violated directly, or is reported as violated deeper in the call stack, the execution of the function fails with some kind of invalid state error. However, as with other suggestions of this nature, the fundamental blocker here is that to be particularly useful this kind of validation needs to happen by default (as opposed to opt-in) which risks breaking existing code. And so I foresee this request falling into the same category as those others - an interesting idea that could probably be made to work but by itself isn't worthwhile enough to go and introduce a fundamental change to the amount of "parental oversight" PostgreSQL tries to provide. David J.
Re: 2 questions about volatile attribute of pg_proc.
Isaac Morland writes: > On Sun, 18 Apr 2021 at 11:36, Tom Lane wrote: >> Are you familiar with the halting problem? I don't see any meaningful >> difference here. > I think what is being suggested is akin to type checking, not solving the > halting problem. Yeah, on further thought we'd be satisfied with a conservative approximation, so that removes the theoretical-impossibility objection. Still, there are a lot of remaining problems, as you note. regards, tom lane
Re: 2 questions about volatile attribute of pg_proc.
On Sun, 18 Apr 2021 at 11:36, Tom Lane wrote: > Andy Fan writes: > > We know volatile is very harmful for optimizers and it is the default > > value (and safest value) if the user doesn't provide that. Asking user > > to set the value is not a good experience, is it possible to > auto-generate > > the value for it rather than use the volatile directly for user defined > > function. I > > think it should be possible, we just need to scan the PlpgSQL_stmt to see > > if there > > is a volatile function? > > Are you familiar with the halting problem? I don't see any meaningful > difference here. > I think what is being suggested is akin to type checking, not solving the halting problem. Parse the function text, identify all functions it might call (without solving the equivalent of the halting problem to see if it actually does or could), and apply the most volatile value of called functions to the calling function. That being said, there are significant difficulties, including but almost certainly not limited to: - what happens if one modifies a called function after creating the calling function? - EXECUTE - a PL/PGSQL function's meaning depends on the search path in effect when it is called, unless it has a SET search_path clause or it fully qualifies all object references, so it isn't actually possible in general to determine what a function calls at definition time If the Haskell compiler is possible then what is being requested here is conceptually possible even if there are major issues with actually doing it in the Postgres context. The halting problem is not the problem here.
Re: 2 questions about volatile attribute of pg_proc.
Andy Fan writes: > We know volatile is very harmful for optimizers and it is the default > value (and safest value) if the user doesn't provide that. Asking user > to set the value is not a good experience, is it possible to auto-generate > the value for it rather than use the volatile directly for user defined > function. I > think it should be possible, we just need to scan the PlpgSQL_stmt to see > if there > is a volatile function? Are you familiar with the halting problem? I don't see any meaningful difference here. regards, tom lane
Re: Bogus collation version recording in recordMultipleDependencies
Julien Rouhaud writes: > On Sat, Apr 17, 2021 at 10:01:53AM +1200, Thomas Munro wrote: >> It seems to me that there are two things that would be needed to >> salvage this for PG14: (1) deciding that we're unlikely to come up >> with a better idea than using pg_depend for this (following the >> argument that it'd only create duplication to have a parallel >> dedicated catalog), (2) fixing any remaining flaws in the dependency >> analyser code. I'll look into the details some more on Monday. > So IIUC the issue here is that the code could previously record useless > collation version dependencies in somes cases, ... > The new situation is now that the code can record too few version dependencies > leading to false negative detection, which is way more problematic. I'm not sure that an error in this direction is all that much more problematic than the other direction. If it's okay to claim that indexes need to be rebuilt when they don't really, then we could just drop this entire overcomplicated infrastructure and report that all indexes need to be rebuilt after any collation version change. But in any case you're oversimplifying tremendously. The previous code is just as capable of errors of omission, because it was inquiring into the wrong composite types, ie those of leaf expression nodes. The ones we'd need to look at are the immediate inputs of record_eq and siblings. Here are a couple of examples where the leaf types are unhelpful: ... where row(a,b,c)::composite_type < row(d,e,f)::composite_type; ... where function_returning_composite(...) < function_returning_composite(...); And even if we do this, we're not entirely in the clear in an abstract sense, because this only covers cases in which an immediate input is of a known named composite type. Cases dealing in anonymous RECORD types simply can't be resolved statically. It might be that that can't occur in the specific situation of CREATE INDEX expressions, but I'm not 100% sure of it. The apparent counterexample of ... where row(a,b) < row(a,c) isn't one because we parse that as RowCompareExpr not an application of record_lt. > We agreed that having possible false positive dependencies was acceptable for > the initial implementation and that we will improve it in later versions, as > otherwise the alternative is to reindex everything without getting any > warning, > which clearly isn't better anyway. [ shrug... ] You have both false positives and false negatives in the thing as it stood before f24b15699. I'm not convinced that it's possible to completely avoid either issue via static analysis. I'm inclined to think that false negatives around record_eq-like functions are not such a problem for real index definitions, and we'd be better off with fewer false positives. But it's all judgment calls. regards, tom lane
Re: 2 questions about volatile attribute of pg_proc.
ne 18. 4. 2021 v 17:06 odesílatel Andy Fan napsal: > Hi: > > We know volatile is very harmful for optimizers and it is the default > value (and safest value) if the user doesn't provide that. Asking user > to set the value is not a good experience, is it possible to auto-generate > the value for it rather than use the volatile directly for user defined > function. I > think it should be possible, we just need to scan the PlpgSQL_stmt to see > if there > is a volatile function? > plpgsql_check does this check - the performance check check if function can be marked as stable https://github.com/okbob/plpgsql_check I don't think so this can be done automatically - plpgsql does not check objects inside in registration time. You can use objects and functions that don't exist in CREATE FUNCTION time. And you need to know this info before optimization time. So if we implement this check automatically, then planning time can be increased a lot. Regards Pavel > The second question "It is v for “volatile” functions, whose results might > change at any time. > (Use v also for functions with side-effects, so that calls to them cannot > get optimized away.)" > I think they are different semantics. One of the results is volatile > functions can't be removed > by remove_unused_subquery_output even if it doesn't have side effects. for > example: > select b from (select an_expensive_random(), b from t); Is it by design > on purpose? > > > -- > Best Regards > Andy Fan (https://www.aliyun.com/) >
2 questions about volatile attribute of pg_proc.
Hi: We know volatile is very harmful for optimizers and it is the default value (and safest value) if the user doesn't provide that. Asking user to set the value is not a good experience, is it possible to auto-generate the value for it rather than use the volatile directly for user defined function. I think it should be possible, we just need to scan the PlpgSQL_stmt to see if there is a volatile function? The second question "It is v for “volatile” functions, whose results might change at any time. (Use v also for functions with side-effects, so that calls to them cannot get optimized away.)" I think they are different semantics. One of the results is volatile functions can't be removed by remove_unused_subquery_output even if it doesn't have side effects. for example: select b from (select an_expensive_random(), b from t); Is it by design on purpose? -- Best Regards Andy Fan (https://www.aliyun.com/)
Consider parent's stats for set_append_rel_size.
Hi: I would talk about the impact of init partition prune for set_append_rel_size. and create_append_path. Finally I just want to focus on set_append_rel_size only in this thread. Given the below example: CREATE TABLE P (part_key int, v int) PARTITION BY RANGE (part_key); CREATE TABLE p_1 PARTITION OF p FOR VALUES FROM (0) TO (10); CREATE TABLE p_2 PARTITION OF p FOR VALUES FROM (10) TO (20); CREATE TABLE p_3 PARTITION OF p FOR VALUES FROM (20) TO (30); INSERT INTO p SELECT i % 30, i FROM generate_series(1, 300)i; set plan_cache_mode to force_generic_plan ; prepare s as select * from p where part_key = $1; explain analyze execute s(2); Then we will get estimated RelOptInfo.rows = 30, but actually it is 10 rows. explain analyze execute s(2); QUERY PLAN - Append (cost=0.00..6.90 rows=30 width=8) (actual time=0.019..0.042 rows=10 loops=1) Subplans Removed: 2 -> Seq Scan on p_1 (cost=0.00..2.25 rows=10 width=8) (actual time=0.017..0.038 rows=10 loops=1) Filter: (part_key = $1) Rows Removed by Filter: 90 Planning Time: 0.885 ms Execution Time: 0.156 ms (7 rows) Actually there are 2 issues here. one is RelOptInfo->rows which is set by set_append_rel_size, the other one appendPath->path.rows is set at create_append_path. They are two independent data. (When we estimate the rows of a joinrel, we only consider the RelOptInfo.rows rather than Path.rows). In set_append_rel_size, it pushes the quals to each child relation and does a sum of each child->rows. child's stats works better than parent stats if we know exactly which partitions we would access. But this strategy fails when init prune comes as above. So I think considering parent's stats for init prune case might be a good solution (Ashutosh has mentioned global stats for this a long time ago[1]). So I want to refactor the code like this: a). should_use_parent_stats(..); Decides which stats we should use for an AppendRel. b). set_append_rel_size_locally: Just do what we currently do. c). set_append_rel_size_globally: We calculate the quals selectivity on AppendRel level, and set the rows with AppendRel->tuples * sel. More about should_use_parent_stats function: 1. If there are no quals for initial partition prune, we use child's stats. 2. If we have quals for initial partition prune, and the left op is not used in planning time prune, we use parent's stats. For example: (part_key = 2 and part_key > $1); However when I was coding it, I found out that finding "quals for initial partition prune" is not so easy. So I doubt if we need the troubles to decide which method to use. Attached is just the PoC version which will use parent's stats all the time. Author: 一挃 Date: Sun Apr 18 22:02:54 2021 +0800 Currently the set_append_rel_size doesn't consider the init partition prune, so the estimated size may be wrong at a big scale sometimes. In this patch I used the set the rows = parentrel->tuples * clauseselecitivty. In this case we can loss some accuracy when the initial partition prune doesn't happen at all. but generally I think it would be OK. Another strategy is we should check if init partition prune can happen. if we are sure about that, we adapt the above way. or else we can use the local stats strategy still. [1] https://www.postgresql.org/message-id/CAExHW5t5Q7JuUW28QMRO7szuHcbsfx4M9%3DWL%2Bup40h3PCd7dXw%40mail.gmail.com -- Best Regards Andy Fan (https://www.aliyun.com/) v1-0001-Currently-the-set_append_rel_size-doesn-t-conside.patch Description: Binary data
Re: proposal - log_full_scan
ne 18. 4. 2021 v 14:28 odesílatel Julien Rouhaud napsal: > On Sun, Apr 18, 2021 at 06:21:56AM +0200, Pavel Stehule wrote: > > > > The extension like pg_qualstat is good, but it does different work. > > Yes definitely. It was just an idea if you needed something right now that > could more or less do what you needed, not saying that we shouldn't > improve the > core :) > > > In > > complex applications I need to detect buggy (forgotten) queries - last > week > > I found two queries over bigger tables without predicates. So the > qualstat > > doesn't help me. > > Also not totally helpful but powa was created to detect problematic > queries in > such cases. It wouldn't say if it's because of a seq scan or not (so yes > again > we need to improve that), but it would give you the slowest (or top > consumer > for any resource) for a given time interval. > > > This is an application for a government with few (but for > > government typical) specific: 1) the life cycle is short (one month), 2) > > there is not slow start - from first moment the application will be used > by > > more hundred thousands people, 3) the application is very public - so any > > issues are very interesting for press and very unpleasant for politics, > and > > in next step for all suppliers (there are high penalty for failures), and > > an admins are not happy from external extensions, 4) the budget is not > too > > big - there is not any performance testing environment > > > > First stages are covered well today. We can log and process very slow > > queries, and fix it immediately - with CREATE INDEX CONCURRENTLY I can do > > it well on production servers too without high risk. > > > > But the detection of some bad not too slow queries is hard. And as an > > external consultant I am not able to install any external extensions to > the > > production environment for fixing some hot issues, The risk is not > > acceptable for project managers and I understand. So I have to use only > > tools available in Postgres. > > Yes I agree that having additional and more specialized tool in core > postgres > would definitely help in similar scenario. > > I think that having some kind of threshold for seq scan (like the mentioned > auto_explain.log_seqscan = XXX) in auto_explain would be the best > approach, as > you really need the plan to know why a seq scan was chosen and if it was a > reasonable choice or not. > I would like to write this for core and for auto_explain too. I was in a situation when I hadnot used auto_explain too. Although this extension is widely used and then the risk is low. When I detect the query, then I can run the explanation manually. But sure I think so it can work well inside auto_explain Regards Pavel
Re: Replication slot stats misgivings
On Sun, Apr 18, 2021 at 9:02 AM vignesh C wrote: > > On Sun, Apr 18, 2021 at 8:43 AM Amit Kapila wrote: > > > > On Sun, Apr 18, 2021 at 7:36 AM vignesh C wrote: > > > > > > On Sun, Apr 18, 2021 at 3:51 AM Tom Lane wrote: > > > > > > > > I wrote: > > > > > The buildfarm suggests that this isn't entirely stable: > > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2021-04-17%2011%3A14%3A49 > > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bichir&dt=2021-04-17%2016%3A30%3A15 > > > > > > > > Oh, I missed that hyrax is showing the identical symptom: > > > > > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2021-04-16%2007%3A05%3A44 > > > > > > > > So you might try CLOBBER_CACHE_ALWAYS to see if you can reproduce it > > > > that way. > > > > > > > > > > I will try to check and identify why it is failing. > > > > > > > I think the failure is due to the reason that in the new tests after > > reset, we are not waiting for the stats message to be delivered as we > > were doing in other cases. Also, for the new test (non-spilled case), > > we need to decode changes as we are doing for other tests, otherwise, > > it will show the old stats. > > I also felt that is the reason for the failure, I will fix and post a > patch for this. Attached a patch which includes the changes for the fix. I have moved the non-spilled transaction test to reduce the steps which reduces calling pg_logical_slot_get_changes before this test. Regards, Vignesh diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out index bc8e601eab..c537e38bfe 100644 --- a/contrib/test_decoding/expected/stats.out +++ b/contrib/test_decoding/expected/stats.out @@ -51,20 +51,15 @@ BEGIN extract(epoch from clock_timestamp() - start_time); END $$ LANGUAGE plpgsql; --- spilling the xact -BEGIN; -INSERT INTO stats_test SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000) g(i); -COMMIT; -SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1'); +-- non-spilled xact +INSERT INTO stats_test values(1); +SELECT count(*) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1'); count --- - 5002 + 3 (1 row) --- Check stats, wait for the stats collector to update. We can't test the --- exact stats count as that can vary if any background transaction (say by --- autovacuum) happens in parallel to the main transaction. -SELECT wait_for_decode_stats(false, true); +SELECT wait_for_decode_stats(false, false); wait_for_decode_stats --- @@ -73,17 +68,17 @@ SELECT wait_for_decode_stats(false, true); SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots; slot_name| spill_txns | spill_count | total_txns | total_bytes -++-++- - regression_slot | t | t | t | t + regression_slot | f | f | t | t (1 row) --- reset the slot stats, and wait for stats collector to reset +-- reset the slot stats, and wait for stats collector's total txn to reset SELECT pg_stat_reset_replication_slot('regression_slot'); pg_stat_reset_replication_slot (1 row) -SELECT wait_for_decode_stats(true, true); +SELECT wait_for_decode_stats(true, false); wait_for_decode_stats --- @@ -95,13 +90,19 @@ SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_ regression_slot | 0 | 0 | 0 | 0 (1 row) --- decode and check stats again. +-- spilling the xact +BEGIN; +INSERT INTO stats_test SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000) g(i); +COMMIT; SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1'); count --- 5002 (1 row) +-- Check stats, wait for the stats collector to update. We can't test the +-- exact stats count as that can vary if any background transaction (say by +-- autovacuum) happens in parallel to the main transaction. SELECT wait_for_decode_stats(false, true); wait_for_decode_stats --- @@ -114,24 +115,42 @@ SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, regression_slot | t | t | t | t (1 row) +-- reset the slot stats, and wait for stats collector to reset SELECT pg_stat_reset_replication_slot('regression_slot'); pg_stat_reset_replication_slot (1 row) --- non-spilled xact -INSERT INTO stats_test values(generate_series(1, 10)); -SELECT wait_for_decode_stats(false, false); +SELECT wait_for_decode_stats(true, true); wait_for_decode_stats --- (
Re: Replication slot stats misgivings
On Sun, Apr 18, 2021 at 12:13 PM Amit Kapila wrote: > > On Sun, Apr 18, 2021 at 7:36 AM vignesh C wrote: > > > > On Sun, Apr 18, 2021 at 3:51 AM Tom Lane wrote: > > > > > > I wrote: > > > > The buildfarm suggests that this isn't entirely stable: > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2021-04-17%2011%3A14%3A49 > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bichir&dt=2021-04-17%2016%3A30%3A15 > > > > > > Oh, I missed that hyrax is showing the identical symptom: > > > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2021-04-16%2007%3A05%3A44 > > > > > > So you might try CLOBBER_CACHE_ALWAYS to see if you can reproduce it > > > that way. > > > > > > > I will try to check and identify why it is failing. > > > > I think the failure is due to the reason that in the new tests after > reset, we are not waiting for the stats message to be delivered as we > were doing in other cases. Also, for the new test (non-spilled case), > we need to decode changes as we are doing for other tests, otherwise, > it will show the old stats. > Yes, also the following expectation in expected/stats.out is wrong: SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots; slot_name| spill_txns | spill_count | total_txns | total_bytes -++-++- regression_slot | f | f | t | t (1 row) We should expect all values are 0. Please find attached the patch. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/ fix_test_decoding_test.patch Description: Binary data
Re: pg_amcheck option to install extension
On 4/17/21 3:43 PM, Mark Dilger wrote: > >> On Apr 16, 2021, at 11:06 AM, Andrew Dunstan wrote: >> >> >> Hi, >> >> Peter Geoghegan suggested that I have the cross version upgrade checker >> run pg_amcheck on the upgraded module. This seemed to me like a good >> idea, so I tried it, only to find that it refuses to run unless the >> amcheck extension is installed. That's fair enough, but it also seems to >> me like we should have an option to have pg_amcheck install the >> extension is it's not present, by running something like 'create >> extension if not exists amcheck'. Maybe in such a case there could also >> be an option to drop the extension when pg_amcheck's work is done - I >> haven't thought through all the implications. >> >> Given pg_amcheck is a new piece of work I'm not sure if we can sneak >> this in under the wire for release 14. I will certainly undertake to >> review anything expeditiously. I can work around this issue in the >> buildfarm, but it seems like something other users are likely to want. > We cannot quite use a "create extension if not exists amcheck" command, as we > clear the search path and so must specify the schema to use. Should we > instead avoid clearing the search path for this? What are the security > implications of using the first schema of the search path? > > When called as `pg_amcheck --install-missing`, the implementation in the > attached patch runs per database being checked a "create extension if not > exists amcheck with schema public". If called as `pg_amcheck > --install-missing=foo`, it instead runs "create extension if not exists > amcheck with schema foo` having escaped "foo" appropriately for the given > database. There is no option to use different schemas for different > databases. Nor is there any option to use the search path. If somebody > needs that, I think they can manage installing amcheck themselves. how about specifying pg_catalog as the schema instead of public? > > Does this meet your needs for v14? I'd like to get this nailed down quickly, > as it is unclear to me that we should even be doing this so late in the > development cycle. I'm ok with or without - I'll just have the buildfarm client pull a list of databases and install the extension in all of them. > > I'd also like your impressions on whether we're likely to move > contrib/amcheck into core anytime soon. If so, is it worth adding an option > that we'll soon need to deprecate? I think if it stays as an extension it will stay in contrib. But it sure feels very odd to have a core bin program that relies on a contrib extension. It seems one or the other is misplaced. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: proposal - log_full_scan
On Sun, Apr 18, 2021 at 06:21:56AM +0200, Pavel Stehule wrote: > > The extension like pg_qualstat is good, but it does different work. Yes definitely. It was just an idea if you needed something right now that could more or less do what you needed, not saying that we shouldn't improve the core :) > In > complex applications I need to detect buggy (forgotten) queries - last week > I found two queries over bigger tables without predicates. So the qualstat > doesn't help me. Also not totally helpful but powa was created to detect problematic queries in such cases. It wouldn't say if it's because of a seq scan or not (so yes again we need to improve that), but it would give you the slowest (or top consumer for any resource) for a given time interval. > This is an application for a government with few (but for > government typical) specific: 1) the life cycle is short (one month), 2) > there is not slow start - from first moment the application will be used by > more hundred thousands people, 3) the application is very public - so any > issues are very interesting for press and very unpleasant for politics, and > in next step for all suppliers (there are high penalty for failures), and > an admins are not happy from external extensions, 4) the budget is not too > big - there is not any performance testing environment > > First stages are covered well today. We can log and process very slow > queries, and fix it immediately - with CREATE INDEX CONCURRENTLY I can do > it well on production servers too without high risk. > > But the detection of some bad not too slow queries is hard. And as an > external consultant I am not able to install any external extensions to the > production environment for fixing some hot issues, The risk is not > acceptable for project managers and I understand. So I have to use only > tools available in Postgres. Yes I agree that having additional and more specialized tool in core postgres would definitely help in similar scenario. I think that having some kind of threshold for seq scan (like the mentioned auto_explain.log_seqscan = XXX) in auto_explain would be the best approach, as you really need the plan to know why a seq scan was chosen and if it was a reasonable choice or not.
Fix dropped object handling in pg_event_trigger_ddl_commands
Hello, when creating an event trigger for ddl_command_end that calls pg_event_trigger_ddl_commands certain statements will cause the trigger to fail with a cache lookup error. The error happens on master, 13 and 12 I didnt test any previous versions. trg=# ALTER TABLE t ALTER COLUMN f1 SET DATA TYPE bigint, ALTER COLUMN f1 DROP IDENTITY; ERROR: XX000: cache lookup failed for relation 13476892 CONTEXT: PL/pgSQL function ddl_end() line 5 at FOR over SELECT rows LOCATION: getRelationTypeDescription, objectaddress.c:4178 For the ALTER DATA TYPE we create a command to adjust the sequence which gets recorded in the event trigger commandlist, which leads to the described failure when the sequence is dropped as part of another ALTER TABLE subcommand and information about the sequence can no longer be looked up. To reproduce: CREATE OR REPLACE FUNCTION ddl_end() RETURNS event_trigger AS $$ DECLARE r RECORD; BEGIN FOR r IN SELECT * FROM pg_event_trigger_ddl_commands() LOOP RAISE NOTICE 'ddl_end: % %', r.command_tag, r.object_type; END LOOP; END; $$ LANGUAGE plpgsql; CREATE EVENT TRIGGER ddl_end ON ddl_command_end EXECUTE PROCEDURE ddl_end(); CREATE TABLE t(f1 int NOT NULL GENERATED ALWAYS AS IDENTITY); ALTER TABLE t ALTER COLUMN f1 DROP IDENTITY, ALTER COLUMN f1 SET DATA TYPE bigint; I tried really hard to look for a different way to detect this error earlier but since the subcommands are processed independently i couldnt come up with a non-invasive version. Someone more familiar with this code might have an idea for a better solution. Any thoughts? https://www.postgresql.org/message-id/CAMCrgp39V7JQA_Gc+JaEZV3ALOU1ZG=pwyk3odptq7f6z0j...@mail.gmail.com -- Regards, Sven Klemm From 4faf761528c735bd808e7ea4d9f356710e3a8ed0 Mon Sep 17 00:00:00 2001 From: Sven Klemm Date: Sat, 17 Apr 2021 21:54:03 +0200 Subject: [PATCH v1] Fix pg_event_trigger_ddl_commands If an object is modified that is dropped in the same command we might end up in a position where we generated a message but can no longer look up the object information because the object got dropped. E.g. ALTER TABLE ALTER COLUMN, DROP IDENTITY --- src/backend/commands/event_trigger.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c index 5bde507c75..f7b2c30f82 100644 --- a/src/backend/commands/event_trigger.c +++ b/src/backend/commands/event_trigger.c @@ -1936,8 +1936,17 @@ pg_event_trigger_ddl_commands(PG_FUNCTION_ARGS) else if (cmd->type == SCT_AlterTSConfig) addr = cmd->d.atscfg.address; - type = getObjectTypeDescription(&addr, false); - identity = getObjectIdentity(&addr, false); + /* + * If an object is modified that is dropped in the same + * command we might end up in a position where we + * generated a message but can no longer look up the + * object information because the object got dropped. + * E.g. ALTER TABLE ALTER COLUMN, DROP IDENTITY + */ + type = getObjectTypeDescription(&addr, true); + identity = getObjectIdentity(&addr, true); + if (!identity) + continue; /* * Obtain schema name, if any ("pg_temp" if a temp -- 2.30.0
Re: Bogus collation version recording in recordMultipleDependencies
On Sat, Apr 17, 2021 at 10:01:53AM +1200, Thomas Munro wrote: > On Sat, Apr 17, 2021 at 8:39 AM Tom Lane wrote: > > Per the changes in collate.icu.utf8.out, this gets rid of > > a lot of imaginary collation dependencies, but it also gets > > rid of some arguably-real ones. In particular, calls of > > record_eq and its siblings will be considered not to have > > any collation dependencies, although we know that internally > > those will look up per-column collations of their input types. > > We could imagine special-casing record_eq etc here, but that > > sure seems like a hack. > > [...] > > > So I wonder if, rather than continuing to pursue this right now, > > we shouldn't revert 257836a75 and try again later with a new design > > that doesn't try to commandeer the existing dependency infrastructure. > > We might have a better idea about what to do on Windows by the time > > that's done, too. > > It seems to me that there are two things that would be needed to > salvage this for PG14: (1) deciding that we're unlikely to come up > with a better idea than using pg_depend for this (following the > argument that it'd only create duplication to have a parallel > dedicated catalog), (2) fixing any remaining flaws in the dependency > analyser code. I'll look into the details some more on Monday. So IIUC the issue here is that the code could previously record useless collation version dependencies in somes cases, which could lead to false positive possible corruption messages (and of course additional bloat on pg_depend). False positive messages can't be avoided anyway, as a collation version update may not corrupt the actually indexed set of data, especially for glibc. But with the infrastructure as-is advanced user can look into the new version changes and choose to ignore changes for a specific set of collation, which is way easier to do with the recorded dependencies. The new situation is now that the code can record too few version dependencies leading to false negative detection, which is way more problematic. This was previously discussed around [1]. Quoting Thomas: > To state more explicitly what's happening here, we're searching the > expression trees for subexpresions that have a collation as part of > their static type. We don't know which functions or operators are > actually affected by the collation, though. For example, if an > expression says "x IS NOT NULL" and x happens to be a subexpression of > a type with a particular collation, we don't now that this > expression's value can't possibly be affected by the collation version > changing. So, the system will nag you to rebuild an index just > because you mentioned it, even though the index can't be corrupted. > To do better than that, I suppose we'd need declarations in the > catalog to say which functions/operators are collation sensitive. We agreed that having possible false positive dependencies was acceptable for the initial implementation and that we will improve it in later versions, as otherwise the alternative is to reindex everything without getting any warning, which clearly isn't better anyway. FTR was had the same agreement to not handle specific AMs that don't care about collation (like hash or bloom) in [2], even though I provided a patch to handle that case ([3]) which was dropped later on ([4]). Properly and correctly handling collation version dependency in expressions is a hard problem and will definitely require additional fields in pg_proc, so we clearly can't add that in pg14. So yes we have to decide whether we want to keep the feature in pg14 with the known limitations (and in that case probably revert f24b15699, possibly improving documentation on the possibility of false positive) or revert it entirely. Unsurprisingly, I think that the feature as-is is already a significant improvement, which can be easily improved, so my vote is to keep it in pg14. And just to be clear I'm volunteering to work on the expression problem and all other related improvements for the next version, whether the current feature is reverted or not. [1]: https://www.postgresql.org/message-id/CA%2BhUKGK8CwBcTcXWL2kUjpHT%2B6t2hEFCzkcZ-Z7xXbz%3DC4NLCQ%40mail.gmail.com [2]: https://www.postgresql.org/message-id/13b0c950-80f9-4c10-7e0f-f59feac56a98%402ndquadrant.com [3]: https://www.postgresql.org/message-id/20200908144507.GA57691%40nol [4]: https://www.postgresql.org/message-id/CA%2BhUKGKHj4aYmmwKZdZjkD%3DCWRmn%3De6UsS7S%2Bu6oLrrp0orgsw%40mail.gmail.com